AI Safety in China #20
Politburo studies AI, updated expert AI law draft, AI existential risks as a national security issue, Alibaba’s technical safety measures, and vulnerabilities of multimodal reasoning LLMs
Key Takeaways
China’s top officials in the Politburo held the first dedicated meeting on AI development and safety since 2018.
An updated expert draft AI law added provisions for whistleblower protection and increased the scope for ethics reviews.
A Tongji University IR scholar argued for categorizing AI existential risks as a national security issue.
Alibaba published a new report on its alignment measures, red teaming efforts, and content screening tools.
A new technical paper found that multimodal reasoning LLMs have greater vulnerability to unsafe queries and jailbreak attacks than non-reasoning multimodal LLMs.
Domestic AI Governance
Politburo holds first dedicated AI development and safety meeting in seven years
Background: On April 25, the Politburo of the Communist Party of China (CPC)—China’s 24 top officials—held a study session on AI, and President Xi Jinping delivered a speech (En, Cn). Politburo study sessions are important signals of top political priorities—the last one on AI was in 2018. Noting that AI is “profoundly reshaping people's work and life,” Xi’s speech covered AI development, safety, and international collaboration:
On development: Xi called for achieving fundamental breakthroughs, driving industry applications, and providing policy support:
He underscored the need to secure a competitive edge in AI by improving basic research, achieving self-reliance in advanced chips and foundational software, and leveraging AI for scientific breakthroughs.
China should leverage strengths including abundant data, a complete industrial system, and broad application scenarios to drive application-oriented AI development. Xi also emphasized improving compute infrastructure and data utilization.
He promised comprehensive policy support, including measures on IP rights, tax incentives, government procurement, financing, and talent development.
On safety: Xi noted that, while AI brings “unprecedented development opportunities,” it also brings “unprecedented risks and challenges.” China should expedite the formulation of “laws and regulations, policies and systems, application norms and ethical guidelines.” The readout also calls for “systems for technology monitoring, early risk warning and emergency response” to ensure that AI is safe, reliable, and controllable.
On international governance: Xi observed that AI could serve as a “global public good that benefits humanity” and highlighted the importance of capacity building in the Global South to help close the global AI divide. He also stressed the need for greater international alignment on development strategies, governance rules, and technical standards.
Implications: While the key themes—industrial application-focused AI development, self-reliance, and “safe, reliable, and controllable” AI—are not new, the Politburo study session sends a strong signal of top leadership’s intensified focus on AI development and safety. This could reflect increased interest following the success of DeepSeek, and Xi followed up this meeting with a visit to a Shanghai accelerator space with a significant focus on large model development and application on April 29. However, the Politburo study session readout is more generic in its discussion of AI, avoiding specific references to generative AI, AGI, or other specific types of AI.
The choice of guest lecturer further underscores a broader focus than just general-purpose frontier models. ZHENG Nanning (郑南宁), a professor and former president (2003-2014) at Xi'an Jiaotong University is an expert in computer vision, pattern recognition, and advanced computing architectures rather than LLMs, and he has rarely discussed large model development in public. He is also an Academician of the Chinese Academy of Engineering and was a delegate to the 16th, 17th, and 18th CPC National Congresses.
Notably, the mention of “technology monitoring, early risk warning, and emergency response systems” represents one of the most specific discussions of AI safety mechanisms in a high-level document to date, especially when compared to the more vague statements from the 2018 Politburo study session or the 2024 Third Plenum’s reference to “instituting oversight systems to ensure AI safety.” This also marks the first time the Chinese government used the phrase “unprecedented risks and challenges” in relation to AI, suggesting heightened concern of AI risk. This may indicate acceleration of new Chinese domestic standards or regulations on AI safety. On international governance, this readout is the first time the Chinese government referred to AI as a “global public good.” This constitutes continued reinforcement of Chinese messaging and could herald strengthened efforts on global capacity building.
Minister of State Security addresses AI in national security essay
Background: On April 15, CHEN Yixin (陈一新), China’s Minister of State Security, published an extensive essay on national security in Qiushi, the official theoretical journal of the CPC. The Ministry of State Security (MSS) is China’s civilian intelligence and security service, responsible for foreign intelligence, counterintelligence, and political security.
Discussion of AI: In his essay, Chen addresses a broad range of national security challenges, from terrorism to espionage to Taiwan independence. The essay starts with a discussion of “changes unseen in a century” shaping China’s national security landscape. Within this context, Chen wrote a paragraph on technological changes where AI is the only technology specifically mentioned, and he emphasizes that AI is:
Experiencing “explosive” development, transforming society with unprecedented depth and breadth;
Intensifying global competition, with advanced nations seeking to maliciously block latecomers from catching up;
Increasing technological safety risks, as AI and other new technologies could reshape civilization, global power structures, and social governance models;
Creating a need for strengthened risk assessment and prevention to ensure that technologies are “safe/secure, reliable, and controllable.”
Implications: As discussed in our previous newsletter, AI safety’s increasing attention in China’s national security planning is a noteworthy development throughout early 2025. Chen’s essay continues this trend with a focus on changes and risks brought by AI in a section ostensibly about broader technological issues. His essay also suggests concern about loss of control of AI by repeating a well-known slogan for “safe, reliable, and controllable” technology.
Updated AI Law expert draft proposes whistleblower protections
Background: A group of scholars led by ZHOU Hui (周辉), Deputy Director of the Cyber and Information Law Research Office at the Institute of Law of the Chinese Academy of Social Sciences (CASS), released version 3.0 of their AI Model Law on March 29. This iteration follows version 1.0 from August 2023 and version 2.0 from April 2024. The Model Law maintains its previous broad focus on AI development, copyright, privacy, and safety, retaining key provisions such as a licensing system for high-risk applications. Version 3.0 made a few notable refinements, such as defining open‑source AI, allowing even copyrighted online data to be used for training open-source models unless creators opt out (Article 21), and including “safe harbor” provisions (Article 81) that shield companies from copyright liability under certain conditions.
Key changes relevant to AI safety:
Whistleblower protection (Article 70): The law seeks to protect whistleblowers who report potential major harms, security risks, or legal violations such as the deliberate distortion of security assessment results. AI developers, providers, and users must implement robust internal reporting mechanisms. Additionally, central government departments in charge of AI are tasked with establishing dedicated channels for whistleblower reports. Institutions are explicitly prohibited from retaliating against whistleblowers, with penalties for violations.
Ethics: The updated version adds ethics in the “general principles” section (Article 11). It also creates a new requirement for public sector organizations researching, providing, or using AI capabilities to set up ethics review mechanisms (Article 49). It encourages creation of third-party institutions or companies to conduct ethics review for SMEs lacking in-house committees.
Dynamic policy review (Article 24): The government has to conduct assessments to evaluate existing regulations, and adjust or abolish rules that hinder innovation or compromise safety.
Edge-AI / on-device AI (Article 62): The law requires heightened transparency and security for AI operating directly on user devices, emphasizing privacy protection.
Implications: The most notable governance enhancement in this draft is the clause on whistleblower protections. This clause echoes potential provisions in draft California AI legislation, indicating potential mutual learning on domestic governance mechanisms. Overall, the updated draft did not significantly refine previous suggestions on AI safety and governance, perhaps reflecting that the authors consider measures in previous proposed drafts sufficient. Although this CASS proposal is influential as an expert recommendation, it is not an official government draft, and the timeline for an actual national law remains uncertain.
Technical Safety Developments
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models
April 9: Researchers from China and Singapore found that multimodal large reasoning models (MLRMs) – multimodal LLMs that can ‘reason’ similarly to DeepSeek R1 or OpenAI’s o1 – are more susceptible to unsafe queries and jailbreak attacks. One of the anchor authors is University of Science and Technology of China Lab of Data Science professor WANG Xiang (王翔), who focuses on trustworthy AI; researchers from the National University of Singapore and Nanyang Technological University were also involved. The authors tested unsafe prompts and jailbreak attacks on five open-source MLRMs in scenarios such as illegal activities and hate speech, then compared results against the base, non-reasoning multimodal LLMs. They also open-source a toolkit for MLRM safety evaluation. Key findings:
Reasoning tax: reasoning MLLMs (e.g. R1-OneVision and Mulberry-Qwen2-VL) show increased vulnerability to unsafe text queries and are 37.44% more likely to be jailbroken than their non-reasoning base MLLMs (e.g. DeepSeek R1, Alibaba Qwen2).
Safety blind spots: certain scenarios, such as “illegal activity” questions, show significantly heightened (e.g. 25x worse) vulnerability.
Emergent self-correction: more positively, MLRMs show nascent self-correction abilities, meaning that a small proportion (around 16%) of models being jailbroken at the reasoning stage still provide safe answers.
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
April 7: This paper was authored by researchers from Beijing Foreign Studies University, including professor XU Yuemei (徐月梅), and takes an approach similar to mechanistic interpretability to explore how “national social values” manifest in LLM neurons. Using China’s “Core Socialist Values” including national level (prosperity, harmony), societal level (equality, rule of law), and personal level (patriotism, integrity), they create a bilingual benchmark that simulates how individuals would behave if they adhere to certain values. The authors then analyze activation frequency of neurons associated with specific values and experiment with deactivating those neurons to observe effects on the models’ support for the relevant value.

Other relevant technical publications
Renmin University of China and Ant Group, From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment, arXiv preprint, March 19, 2025.
Zhejiang University and Hangzhou Dianzi University, Bridging the Gap Between Preference Alignment and Machine Unlearning, arXiv preprint, April 9, 2025.
Tsinghua University, An Evaluation of Cultural Value Alignment in LLM, arXiv preprint, April 11, 2025.
Expert views on AI Risks
IR scholar urges putting AI existential risks on national security agenda
Background: On February 5, LU Chuanying (鲁传颖), an international relations scholar at Tongji University, published an essay on AI and national security in the People’s Tribune journal, managed by CPC official newspaper People’s Daily. We previously covered Lu’s work in Issue #4, and he participated in Track 2 dialogues with US think tanks on cybersecurity and AI while previously working at the local government-backed Shanghai Institute for International Studies. While Lu’s essay addresses a broad range of topics, it specifically underscores the existential risks posed by AI.
What stands out:
Lu explores three main dimensions of AI-related national security risks echoing the taxonomy in the International AI Safety Report: safety defects in the technology; misuse and irresponsible deployment; inadequate social response mechanisms.
Lu discusses the threat of AI-automated cyberattacks and the dangers of incorporating black-box AI systems into critical decision-making frameworks, such as nuclear command and control.
He argues that existential risks and irreversible societal harm caused by AI must be integrated into national security agendas.
He suggests that China should establish an institution that can coordinate on safety issues between the government, industry, and R&D institutions, mentioning the US AI Safety Institute as a potential model.
In a presentation to Alibaba on April 3, Lu emphasized that despite the growing emphasis on national security, AI safety remains a viable area for China-US collaboration. He highlighted shared concerns, including:
Existential risk: Both countries share concerns about AI alignment and safety, as voiced by scientists like Geoffrey Hinton and Andrew Yao (姚期智). Lu noted: “Beyond national and military security, the more critical risk posed by AI is existential risk.”
Non-proliferation: Both sides have a vested interest in preventing advanced AI from falling into the hands of irresponsible actors.
Tech standardization: There is mutual interest in ensuring the interoperability and standardization of AI technologies.
Lu also suggested innovating diplomatic dialogue formats, for instance, industry involvement is crucial for AI alongside the traditional diplomatic role of the Ministry of Foreign Affairs.

Implications: As discussed in Issue #19, AI safety is gaining increasing attention in China’s national security planning. While official government statements on the topic remain relatively vague, essays like Lu’s, published in party-state media, offer insights into discourse trends. Lu’s essay mentioned concerns about AI misuse in cyberattacks and highlighted existential risks, reflecting shared concerns about frontier AI safety in China. Moreover, Lu’s perspective suggests that, despite competitive dynamics, these national security concerns could serve as a foundation for cooperation between the US and China.
Alibaba publishes comprehensive report on safety practice
Background: Alibaba and the China Electronics Standardization Institute (CESI) have jointly published a seven-part report on LLMs, with a chapter focused on safety and governance released on April 17.1 It provides a comprehensive overview of Alibaba’s views on AI safety, including specific examples of the company’s safety practices:
During training: Techniques like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) are employed, with datasets tailored to Chinese regulations and cultural context. It employs a standardized annotation pipeline with detailed rulebooks and activity logging for traceability.
Pre-deployment: Alibaba claims to conduct extensive internal red teaming pre-deployment. They developed Moyu, an attack-defense platform simulating real-world deployments, and RedChain, a LangChain-based framework that generates automatic multimodal, multilingual, and/or multi-round attacks.
During content generation: Standard Q&A libraries provide pre-approved, compliant responses for common queries. If no standard answer exists, user queries and model outputs are screened for harmful or non-compliant content. Alibaba claims to have improved upon Meta’s Prompt Guard for input filtering by reducing false positives. The company uses Constitutional AI with a focus on safety, appropriateness, positivity, and relevance.
Content dissemination: Reporting and response systems for Qwen-based platforms combine automated detection with human oversight to address harmful content.
Application governance: A risk matrix based on industry, ethical sensitivity, technical features, and audience assigns risk levels, allocating oversight resources accordingly. Systems with significant potential impact on public safety, privacy protection, social order, or economic stability would be considered “high risk.”
Misuse prevention: Technical controls are strengthened in sensitive sectors like fintech, education, and healthcare to ensure compliance with industry-specific regulations. The report also argues that licenses for open-source AI may need to impose stricter usage restrictions and ethical requirements than traditional open-source software. These measures should be tailored to the model’s capabilities and associated risk levels.
Implications: This report offers rare transparency into the technical safety practices of a major Chinese AI developer. Nevertheless, Chinese developers still have room to increase transparency of their safety measures and practices in accordance with global best practices. Notably, the report extensively references global AI safety practices and shows that many safety techniques – fine-tuning, red teaming, constitutional AI – are similar between Chinese and Western companies. It also documents how Alibaba builds on international open-source tools, indicating that such tools can foster global AI safety knowledge diffusion. At the same time, the report also makes clear that Alibaba’s safety practices mostly focus on compliance with local Chinese regulations and standards, and it lacks discussion of catastrophic safety risks.
Influential government think tank flags AI-bio risks
Background: On March 31, researchers from the Chinese Academy of Science and Technology for Development (CASTED) published an article on post-pandemic biosafety and biosecurity, highlighting AI as a growing risk factor. CASTED is a think tank directly supervised by China’s Ministry of Science and Technology (MOST). Co-author XU Ye (许晔) is deputy head of CASTED’s Institute of Frontier Science and Emerging Technology and participated in the drafting of multiple Five-Year Plans and the 2017 New Generation AI Development Plan.
AI and biosecurity: The article focuses on the rising threat of pandemic risks. In the section on anthropogenic risk it underscores the potential misuse of AI in biotechnology along with the risks of genome editing and synthetic biology. The authors warn of several different threat vectors:
Advances in AI for protein structure prediction and design could be exploited to create bioweapons or novel toxins;
AI modelling could be used to optimize pathogen spread;
Generative AI can gather and analyze online information for conducting biological attacks;
AI may be used to develop "smart microorganisms" capable of targeted attacks;
Digital tools used in drug design and bio manufacturing could become targets of attacks.
To mitigate these threats, the authors recommend that China:
Establish a “dual-use technology list,” register relevant institutions and personnel, provide specialized training, and improve safety practices;
Lead in developing international standards for AI-bio safety, such as preventing AI-driven misuse involving synthesis of genetic material;
Strengthen safeguards against terrorist misuse of AI in biology.
Implications: This article indicates a continued interest among Chinese policy advisors in exploring the intersection of AI and bio risks, given previous articles that were covered in Issue #12 and #18. The article also raises some unique policy suggestions, such as registering institutions working on dual-use technologies. However, these discussions are still nascent, as Chinese regulations and standards have shown little interest in the convergence of AI and biosecurity risks apart from a standards body’s AI Safety Governance Framework. The article references international investments in biosecurity as a motivator for China to increase its own investment, suggesting that a “race to the top” on AI-bio risk is possible.
What else we’re reading
Is China Racing to AGI?, ChinaTalk, April 2, 2025.
Xiao Qian, Can U.S. and China Rebuild Trust on AI?, China-US Focus, April 3, 2025
Ben Bucknall et al., In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?, arXiv, April 17, 2025.
Miles Brundage and Grace Werner, America First Meets Safety First, AI Frontiers, April 23, 2025.
Eliot Chen, Freezing China Out of Setting the Rules for AI, the Wire China, April 27, 2025.
Concordia AI’s Recent Work
Concordia AI co-hosted and participated in a series of events alongside the International Conference on Learning Representations (ICLR) machine learning conference.
Concordia AI co-hosted an exclusive session for Singaporean government officials featuring four expert talks on AI safety and governance challenges and solutions.
We also convened 25-30 leading experts for a Misalignment and Control Technical workshop focused on risks of AI deception and loss of control.
This was followed by a 130+ person AI Safety Networking Social, bringing together experts from all around the world.
Concordia AI presented on AI Safety in China at FAR.AI’s Singapore Alignment Workshop 2025.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
CESI is a standard-setting body under the Ministry of Industry and Information Technology (MIIT).
Really helpful overview. Interesting to see China’s recent moves in AI governance, though I’m cautious about how much is genuine safety concern versus CCP control or strategic competition. I explore the PLA adoption of AI on my Substack if you’re interested. https://open.substack.com/pub/ordersandobservations/p/where-are-chinese-military-data-centers?utm_campaign=post&utm_medium=web