AI Safety in China #19
Top official warns against unchecked global AI competition, corporate safety commitments, AI safety and national security, frontier risks in technical standards plans, papers on superalignment
Note: This newsletter edition was delayed due to Concordia AI’s participation in the French AI Action Summit and other research projects. It covers developments from January to March 2025 and is much longer than usual. We plan to return to a monthly publication schedule going forward.
Key Takeaways
One of the top seven officials in China warned of unrestrained AI competition at the World Economic Forum in Davos.
China participated in the French AI Action Summit, where leading Chinese institutions launched the China AI Safety & Development Association—framed as China’s counterpart to an AI Safety Institute.
17 Chinese companies endorsed a set of AI Safety Commitments.
AI safety was referenced in high-level Chinese government plans around public safety and emergency planning, and a leading security-linked think tank wrote a deep analysis of AI safety’s national security implications.
National plans for developing technical AI standards explicitly reference frontier AI safety risks.
New technical papers were published on dual-use risks, scalable oversight, and mechanistic interpretability.
International AI Governance
Senior Chinese official warns against AI competition at Davos
Background: On January 21, 2025, Chinese Executive Vice Premier DING Xuexiang (丁薛祥) emphasized (En, Ch) the need for international cooperation on AI safety at the World Economic Forum (WEF) in Davos. Ding is China’s top science and technology official as the head of the Central Science and Technology Commission, and he is one of China’s top seven leaders as a member of the Communist Party of China’s (CPC) Politburo Standing Committee.
A warning against unrestrained competition: Ding cautioned that unchecked global competition in AI poses significant risks, characterizing it as a “gray rhino”—a term commonly used to refer to a highly probable but neglected threat, in contrast to highly improbable “black swans.” He stated that China “will not engage in reckless international competition.”
Ding rejected the notion that development and safety contradict each other: “It’s like driving on a highway—if the braking system isn’t under control, you can't step on the accelerator with confidence.” He further praised the UN’s historical role in nuclear and biological governance, arguing it should similarly play a central role in AI governance.
At the same time, Ding called for making AI’s vast potential benefits broadly accessible, emphasizing that the technology must serve all countries, including those in the Global South, rather than becoming a “game for the rich.”

Implications: At Davos in 2024, Premier LI Qiang (李强) also stressed AI safety, but Ding’s explicit use of the term “gray rhino” and warning against “disorderly competition” mark the first time both phrases have been applied to international AI competition. Though consistent with China’s rhetorical emphasis on AI cooperation and criticism of US export controls, Ding’s comments signal a heightened sense of concern. Top government officials have only rarely labelled specific risks as “gray rhinos,” and previous references in the financial sector have been correlated with concrete government action. Despite geopolitical rivalries accelerating technological development, Ding’s speech suggests that China still hopes to avert an unconstrained AI arms race.
China participates in the French AI Action Summit
Background: The AI Action Summit took place in Paris on February 10-11, co-hosted by France and India. Chinese Vice Premier ZHANG Guoqing (张国清), one of China’s top 24 officials as a CPC Politburo member, attended as the “special representative” of President Xi Jinping.
China’s engagement: Zhang’s speech emphasized China's commitment to international cooperation that promotes development while safeguarding AI security. He called for the international community to use AI for good and refine global governance, referencing China's Global AI Governance Initiative. China was among the 60+ signatories of the summit’s final statement, alongside the EU, India, Japan, and Singapore, with the US and UK abstaining. The statement primarily championed AI development, while noting safety and trust as fundamental conditions for maximizing AI’s benefits.
Implications: China’s decision to send Vice Premier Zhang—one of the country’s top leaders and as a “special representative”—represents a significant elevation in diplomatic engagement compared to the vice minister who participated in the Bletchley Summit in 2023. While the Paris summit successfully emphasized development and cooperation, it failed to achieve concrete safety protocols as seen at the Bletchley and Seoul summits, underscoring difficulties in reaching global consensus on AI governance.
China AI Safety & Development Association announced at Paris Summit
Background: During a side event at the French AI Action Summit, a coalition of leading Chinese universities and research institutes, AI labs, and think tanks announced the formation of the “China AI Safety & Development Association” (CnAISDA), self-described as China's counterpart to international AI Safety Institutes (AISI).
The new network: During the event, Tsinghua Institute for AI International Governance (I-AIIG) Dean XUE Lan (薛澜) introduced CnAISDA as a collaborative network that integrates China's leading resources and expertise in AI development and safety. He emphasized CnAISDA’s role as China's counterpart to global AISIs, designed to facilitate international dialogue on AI safety. In his keynote speech, Tsinghua Professor and Turing Award Winner Andrew YAO (姚期智) highlighted the expansion of technical AI safety teams in China as part of growing global cooperation on AI safety. Former Vice Minister of Foreign Affairs FU Ying (傅莹), speaking at another event in Paris, contextualized CnAISDA’s mission within China’s dual concerns about AI safety at the application level as well as “future AI risks” developing in a time of “technological explosion.”
CnAISDA founding members include:
Top Universities and Research Institutes: Institute of Automation of the Chinese Academy of Sciences (CASIA), Peking University, Tsinghua University.
State-backed Labs: Beijing Academy of AI (BAAI), Shanghai AI Lab, Shanghai Qi Zhi Institute (SQZ).
Government-affiliated Think Tanks: China Academy of Information and Communications Technology (CAICT), China Center for Information Industry Development (CCID).
CnAISDA claims Chinese government backing, though public endorsement from government representatives remains pending. Nevertheless, there was extensive coverage in Chinese state media.

Implications: CnAISDA is attempting to enhance China’s global engagement on AI safety, particularly with other international AISIs. However, critical questions remain about the association’s domestic mandate, relationship with Chinese government entities, and AI safety testing capabilities.
Domestic AI Governance
AI safety emphasized in national security and emergency planning
AI safety and national security: On February 28, the Politburo, the top 24 CPC officials, held a study session meeting on “constructing a safer China” where President Xi Jinping positioned AI safety as a critical component of China’s comprehensive national security framework. The session classified AI risks under “public security,” alongside disaster prevention, production safety, food and drug safety, and cybersecurity, advocating for “preemptive prevention” governance models for these domains.
AI safety and national emergencies: The revised National Emergency Response Plan, released on February 25 to replace the 2005 version, also has explicitly included AI risks. The document outlines China’s framework for preventing, responding to, and recovering from major disasters. In a section detailing the specific risks that require monitoring, AI safety is listed alongside animal epidemics, cybersecurity, and financial anomalies—signaling integration into emergency planning.
Implications: These developments build upon the July 2024 Third Plenum resolution, which prominently linked AI safety with national security. While more detailed implementation measures for frontier AI safety remain to be seen, the steady integration of the Third Plenum directive into public safety and national security planning suggests a trajectory toward more concrete measures in the future.
Technical standards plans by multiple institutions include frontier risks
TC260 plans: In January 2025, cybersecurity standards committee TC260 released the AI Safety Standards System (V1.0) - Draft for Comments (Standards System Draft). The document provides a high-level map of 24 current and future standards, and marks progress toward implementing TC260’s AI Safety Governance Framework, published in September 2024.
While covering a wide range of AI risks, the Standards System Draft references several frontier safety domains. It includes in-development and approved standards on cyberattack misuse and indicates plans to develop new standards for loss of control, safety alignment, testing benchmarks, risk assessments, agent safety, multimodal safety, and safety guardrails. For instance, it proposes an “AI Safety Guardrails Construction Guide” to address loss of control risks, though it is unclear whether this guide is in active development.
MIIT plans: Meanwhile, the Ministry of Industry and Information Technology (MIIT) set up an AI standardization technical committee with a safety and governance working group in December 2024. On March 27, 2025, the technical committee released a draft guide outlining detailed plans to develop standards on governance capabilities, infrastructure security, cybersecurity, data security, algorithm and model security, application security, and AI for security. Several planned standards touch on frontier safety issues—such as interpretability, risk classification, agent application safety, and risk governance—with plans to complete standards in 2-3 years.
Implications: The TC260 draft signals intent by a top Chinese AI standard-setting body to address frontier risks. While MIIT’s committee focuses on subordinate industry standards, its detailed roadmap spanning multiple domains relevant to frontier risks—paired with relatively short 2-3 year timelines—is a promising signal of commitment to implementation.
Chinese companies agree upon voluntary AI safety measures
Context: On December 24, 2024, 17 Chinese firms—including LLM startups DeepSeek, Zhipu.AI, Minimax, and 01.AI, big tech companies Alibaba, Baidu, and Huawei, and established AI companies such as iFlytek—endorsed “AI Safety Commitments” launched by the AI Industry Alliance (AIIA). AIIA is a Chinese industry body that works closely with the China Academy of Information and Communications Technology (CAICT), a think tank under MIIT. Previous discussion of CAICT-AIIA projects can be found in our analysis of China’s evaluation ecosystem and newsletter issues #4, #8, and #14.
What’s in the commitments? Signatories agree to establish clear risk identification and mitigation processes throughout the AI development life cycle, pledging to:
Allocate resources to dedicated safety teams.
Conduct rigorous safety and security testing, such as red teaming.
Strengthen data security measures.
Enhance security of AI software and hardware infrastructure.
Boost transparency and enable external oversight.
Advance frontier AI safety research, particularly for AI agents and embodied AI.
CAICT has requested signatories to share implementation details by March 15, including information about safety team structures and safety evaluation datasets, to compile best practices that can guide industry efforts. Referencing OpenAI’s Preparedness Framework and Anthropic’s Responsible Scaling Policy, CAICT encourages signatories to share risk evaluation methodologies.
Implications: These commitments mark the strongest collective position yet on AI safety by Chinese industry, securing the support of most leading AI developers in the country. Despite lacking binding enforcement measures, CAICT's active follow-up signals serious implementation intentions. These commitments share similarities to the Frontier AI Safety Commitments from the May 2024 AI Seoul Summit, which are signed by three Chinese firms, in requirements for transparency and foundational safety research; however, the Seoul commitments place greater emphasis on defining critical risk thresholds. Both approaches are inherently limited by their voluntary nature, though their alignment reflects a shared commitment to AI safety, while the differences suggest opportunities for international exchange and mutual learning.
AI safety receives modest nod at legislative meeting
Context: China’s annual “Two Sessions” legislative convening of the National People’s Congress (NPC), and Chinese People’s Political Consultative Conference (CPPCC) took place on March 4-11. The Two Sessions are one of the most important annual political meetings in China and publishes the Government Work Report, which outlines priorities for the coming year.
Key observations:
The Government Work Report (En, Ch) modestly increased attention on AI, adding references to “large models” and “embodied AI” alongside last year’s application-focused “AI+” initiative. Although the report itself did not address AI safety, drafting committee deputy head CHEN Changsheng (陈昌盛) discussed safety from an international governance perspective during a press conference. He reaffirmed China's support for AI governance under the UN framework, emphasizing that AI development should proceed "under the premise of ensuring safety."
The National Development and Reform Commission (NDRC) Work Report made its first explicit mention of “integrating safety and ethics into AI R&D and applications.” While the NDRC functions primarily as China's macroeconomic planner rather than a regulator, its acknowledgment of safety is notable given its broader coordinating functions across the economy. The NDRC also announced a new venture fund for emerging technologies including AI, which will likely focus on development but could potentially include safety-oriented investments.
The NPC Standing Committee Work Report did not suggest prioritizing an AI Law, instead only committing to “strengthen legislative research” on AI in 2025.
Implications: The Two Sessions sent similar dual signals as last year. From a development perspective, a sharper AI focus was predictable given attention on DeepSeek in recent months. Conversely, lack of an emphasis on safety is unsurprising given the meetings’ overall focus on high-level macroeconomic goals. The NDRC Work Report’s safety reference was the most positive signal, though its practical impact remains uncertain. Meanwhile, an overarching AI Law appears distant.
Technical Safety Developments
Chinese researchers have published a large number of technical AI safety papers in the first quarter of 2025. Below, we highlight a few noteworthy papers that address dual-use misuse risks, superalignment techniques, and mechanistic interpretability.
Dual-use misuse risks
February 2025: The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? This preprint by researchers from Chinese University of Hong Kong and Shanghai Jiao Tong University addresses dual-use safety challenges in LLMs that handle potentially dangerous chemical knowledge. The authors create an alignment technique called LibraAlign based on Direct Preference Optimization (DPO) training on a dataset that distinguishes between legitimate and harmful chemical synthesis requests. Their model claims to outperform leading LLMs by maintaining strong ethical safeguards without sacrificing practical utility.
March 2025: Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents. One of the paper’s anchor authors is Tsinghua University Institute for Interdisciplinary Information Sciences Vice Dean XU Wei (徐葳), also with contributors from Shanghai Qi Zhi Institute, which are both institutions led by Turing Award Winner Andrew Yao. The paper investigates whether LLM agents can exhibit catastrophic behavior and deception in high-stakes scenarios, such as war or pandemics. The authors run 14,400 decision-making rollouts across 12 leading LLMs, finding that many models autonomously choose harmful actions and then attempt to cover them up. Models with stronger reasoning abilities often show higher risk and are more prone to deception. The authors claim their paper shows how current LLM agents, even without malicious prompts, can behave unsafely when navigating trade-offs between being helpful, harmless, and honest in high-stakes environments.
Source: arXiv.
Scalable oversight
March 2025: Scalable Oversight for Superhuman AI via Recursive Self-Critiquing. This paper by scholars from the Chinese Academy of Sciences and Xiaohongshu proposes Recursive Self-Critiquing as a method for scalable oversight of superhuman AI systems. Instead of judging superhuman AI outputs directly, humans evaluate critiques of those outputs—and even recursive critiques of critiques of critiques—making oversight easier and more scalable. The authors conduct experiments to show that critique-of-critique is often easier and more accurate than direct evaluation, allowing humans to maintain effective oversight. However, they also show that current AI models struggle to perform recursive self-critiquing of their own work.
January 2025: Debate Helps Weak-to-Strong Generalization. This paper by researchers from Alibaba’s DAMO Academy proposes a novel method for superalignment that combines scalable oversight and weak-to-strong-generalization. Specifically, it shows that debate between two strong models can help a weak model learn to be a more effective supervisor. Then, the weaker model can provide higher-quality feedback when supervising the larger model’s learning, which is particularly helpful when using groups of small models. One of the co-authors, HUANG Fei (黄非), in 2023 also co-authored a preprint assessing human values alignment of Chinese models covered in newsletter issue #1.
Explainability and mechanistic interpretability
February 2025: The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis. This preprint from the City University of Hong Kong, Harbin Institute of Technology, and Microsoft investigates LLM safety behavior as linear directions in activation space. It finds that LLM refusal behavior is dominated by one main direction in vector space, but also other smaller directions that represent related safety-related behaviors such as tone or role-play. They claim to show that these directions can be analyzed to bypass safety capabilities, revealing new vulnerabilities in alignment.
March 2025: X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability. This paper primarily authored by Shanghai Jiao Tong University and Shanghai AI Lab researchers introduces a method to precisely separate harmful and safe model behaviors from each other in multi-turn conversations by learning a clean “boundary” in the model’s representation space. They claim that this approach strengthens defenses against jailbreaks while preserving general capabilities and reducing over-refusal of benign prompts.
January 2025: Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models. This preprint from Xi’an Jiaotong University discovers that certain attention heads in vision-language models naturally respond to unsafe prompts. Using such "safety heads,” they build a simple, fast detector that flags malicious inputs before the model generates a response. This approach cuts attack success rates from 80% to under 5%, while keeping the model useful for normal prompts.
Source: arXiv.
Expert views on AI Risks
Security-linked think tank examines AI's national security implications
Background: In December, the head of the office of the Holistic View of National Security Research Center located within the China Institutes for Contemporary International Relations (CICIR) published an essay in “Contemporary China and World” analyzing AI safety and governance from a national security perspective. Both have close ties to Chinese national security authorities, and the center was established with Central Publicity Department approval.
Analysis of AI safety and national security: The essay positions AI as the newest, most dynamic, and most prominent domain for national security. It identifies a number of risks from AI, including vulnerability to adversarial attacks, lack of explainability, deepfakes, weaponization for cyberattacks, societal disruptions, and developing more slowly than other countries. Rejecting a “development first, governance later” model as untenable, the essay calls for technological development while strengthening early-stage governance.
From the perspective of AI safety oversight, it acknowledges the risk of scientific innovations triggering catastrophes. The authors argue that AI safety will be a “key area of work for the next five years” given its inclusion in the July 2024 Third Plenum decision. They recommend preventing AI misuse in high-risk domains such as CBRN, enhancing AI interpretability, pursuing legal action against AI-driven cyberattacks, and implementing multistakeholder oversight mechanisms. The authors conclude by proposing international cooperation efforts aligned with the October 2023 Global AI Governance Initiative, such as establishing an international AI governance institution under the UN.
Implications: This AI safety-focused publication by a leading, security-connected think tank reveals that AI governance has become intertwined with national security thinking. The expansive discussion shows that Chinese conceptions of AI safety in national security can encompass catastrophic misuse risk, socioeconomic disruption, and the risk of China falling behind in the competition around AI development. While not explicitly addressing loss of control scenarios, the essay repeated the typical Chinese position of ensuring that AI is “safe, reliable, and controllable.” Overall, the publication suggests that certain Chinese security experts believe in the necessity of international AI safety cooperation while still pursuing technological parity with the US.
Interpreting TC260 AI Safety Governance Framework
Background: Following the September 2024 publication of an AI Safety Governance Framework by cybersecurity standards committee TC260, experts from several key institutions published interpretive analyses in the journal “China Information Security,” which is overseen by government cybersecurity organ China Information Technology Security Evaluation Center (CNITSEC). The four commentaries were penned by researchers at the CNITSEC itself, state-backed standards research group China Electronics Standardization Institute (CESI), national lab Zhongguancun (ZGC) Lab, and Zhejiang University respectively.
Key findings: The CESI article emphasizes developing AI safety standards to implement the Framework, such as for generative AI service safety requirements. The CNITSEC researcher flagged the Framework’s provisions on preventing loss of human control, ensuring value alignment, and establishing emergency response mechanisms. The ZGC Lab experts characterized catastrophic AI risks as “unknown unknowns” and quoted AI safety warnings from Turing Award Winner Geoffrey Hinton and the Venice meeting of the International Dialogues on AI Safety (IDAIS). They specifically framed AI as a dual-use technology, with the potential to amplify risks in conjunction with cyber and biological technology. Lastly, the Zhejiang University professors argued for addressing new AI governance challenges, such as AI agents and embodied intelligence. Most articles called for multistakeholder participation in AI governance, including non-governmental organizations and the public.
Implications: Despite the Framework’s broad scope, frontier AI risks featured prominently across expert commentaries. Citations of IDAIS and Geoffrey Hinton show that international exchanges inform Chinese views of AI risks, even among those not participating in dialogues. The CESI experts suggested that implementation of the Framework will proceed largely through standards.
Industry researchers publish essays on open-source governance
Background: On March 10 and 26 respectively, researchers from Alibaba Research Institute and Tencent Research Institute published sophisticated, technically informed essays on open-source AI governance. Both pieces defend the value of open-source innovation while engaging seriously with its potential risks.
Key messages from the two essays:
Both essays champion open-source development and warn against overregulation that could stifle innovation. They also highlight key safety benefits of open-source, particularly improved transparency and auditability.
At the same time, both acknowledge the safety risks of open-sourcing model weights. Tencent emphasizes risks of misuse and loss of control in downstream deployments, while Alibaba explicitly flags elevated risks in areas like CBRN misuse.
On governance, Alibaba supports a marginal risk framework—evaluating open-sourcing based on whether it introduces additional risks beyond closed-source models.
Tencent notes that there is no open vs. closed source duality, but that openness exists on a spectrum, and governance should adapt accordingly.
Both essays discuss the allocation of responsibilities for open-source AI.
Alibaba emphasizes that developers should not bear all the burden, instead sharing accountability across all stakeholders, including cloud providers, hosting platforms, downstream users, auditors, and regulators.
Tencent stresses the need to distinguish between developers and deployers, critiques California’s SB-1047 for conflating the two, and points to open-source licenses as key mechanisms for assigning liability.
Implications: Chinese views on open-source AI governance are especially critical given China’s development of world leading open-weight models, including DeepSeek and Alibaba’s Qwen. These two essays signal that key Chinese actors are engaging with the global debate in sophisticated ways, with awareness of how open-source might heighten misuse and loss of control risks. However, the emphasis on ecosystem-wide accountability in both essays could also be perceived as a way to deflect responsibility and allow for more unrestrained development.
What else we're reading
Alvin Wang Graylin and Paul Triolo, There can be no winners in a US-China AI arms race, MIT Technology Review, January 21, 2025.
Scott Singer, DeepSeek and Other Chinese Firms Converge with Western Companies on AI Promises, January 28, 2025.
Valerie J. Karplus, Lan Xue, M. Granger Morgan, Kebin He, David G. Victor & Shuang-Nan Zhang, How to sustain scientific collaboration amid worsening US–China relations, Nature, January 14, 2025.
Concordia AI's Recent Work
Concordia AI co-hosted a seminar on “AI Safety as a Collective Challenge” on the sidelines of the French AI Action Summit in February alongside the Carnegie Endowment for International Peace (CEIP), Oxford Martin AI Governance Initiative (AIGI), Tsinghua University Center for International Security and Strategy, and Tsinghua I-AIIG. During the event, Concordia AI, CEIP, and AIGI co-published the report Examining AI Safety as a Global Public Good: Implications, Challenges, and Research Priorities.
During the China AI Safety & Development Association (CnAISDA) side event in Paris on February 11, Turing Award Winner Andrew Yao cited Concordia AI’s State of AI Safety in China report series when describing the increase in AI safety research in Chinese institutions.
Concordia AI CEO Brian Tse was a speaker on the Panel: Global Perspectives during the Inaugural Conference of the International Association for Safe and Ethical AI on February 6.
Concordia AI participated in the UN’s Informal Consultation on Independent International Scientific Panel on AI and Global Dialogue on AI and shared written inputs for the consultation.
The National University of Singapore published a report summarizing the International AI Cooperation and Governance Forum 2024, including the AI safety sessions co-hosted by Concordia AI and Singapore’s AI Verify Foundation.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.