AI Safety in China #12
Foreign Minister Wang Yi on global AI governance, state-backed think tank on AI x bio, high-level government attention to AI, and new papers on LLM agent safety and value alignment
Key Takeaways
Foreign Minister Wang Yi listed safety as one of three key principles that needs to be ensured for AI, mentioning ensuring human control, improving interpretability and predictability, and managing risks.
A researcher at a think tank under China’s State Council published an in-depth analysis of the intersection of AI and biological security risks.
The annual meeting of China’s legislature indicated greater leadership emphasis on AI, but without directly mentioning advanced AI systems and AI safety in the key Government Work Report.
Chinese research groups published papers on LLM agent safety, value alignment across multiple languages, and alignment through negative samples.
International AI Governance
Foreign Minister Wang Yi discusses AI governance at national conference
Background: On March 7, during the “Two Sessions” annual meeting of China’s national legislature,” China’s top foreign policy official WANG Yi (王毅) gave a press briefing (En, Ch).1 While the briefing covered a wide range of issues, Minister Wang answered a question on global AI governance and international cooperation on AI.
AI governance: Wang suggested embracing new opportunities in AI development while highlighting the need for caution, stating that “brakes should be checked before setting off.” He articulated three principles that must be ensured for AI: first, AI as a force for good; second, ensuring safety, including ensuring human control, improving interpretability and predictability, and assessing risks; third, ensuring fairness and setting up an international AI governance institution under the UN framework. Wang also made veiled criticisms of US technology policy towards China, calling the “small yard, high fence” approach “mistakes with historic consequences” that would “only fragment international industrial and supply chains and undercut humanity’s ability to tackle risks and challenges.” Wang also previewed that China would submit a resolution to the UN General Assembly on international cooperation for bridging the AI divide and encouraging technology sharing.
Implications: Minister Wang’s formulation of three principles that must be “ensured” is new. Choosing to focus on safety as the second principle is a strong signal of Chinese government attention to the issue, including mentions of risk management and explainability. The third principle also reiterates China’s support for creating an international institution on AI under the UN, which some experts believe will be necessary for reducing frontier AI risks. Meanwhile, Minister Wang’s critiques of US technology policy towards China also indicate the tension between geopolitics and reducing AI risks; US-China tensions will slow down and may prevent cooperation on reducing AI risks that is even in both countries’ interest.
Domestic AI Governance
Major political meeting sends mixed signals on AI
Background: From March 5 to 11, China held the “Two Sessions” annual meeting of China’s national legislature, involving the National People’s Congress (NPC) and the Chinese People's Political Consultative Conference (CPPCC).2 The key highlights of this event include issuing the yearly Government Work Report, interviews with central government ministers, and updates on legislative efforts.
Key developments:
The Government Work Report (En, Ch) increased emphasis on AI compared to previous years, with the first mention of an “AI+” initiative that is expected to focus on applications of AI in various domains.3 However, there were no specific mentions of AGI, large models, or fundamental AI research.
A number of delegates at the event submitted proposals to the government regarding AI. Many were focused on development, such as iFlytek’s chairman calling for an AGI development plan. Several proposals involved discussion of AI security from an information security or cybersecurity perspective, with some arguing for greater investment around AI applications in cybersecurity.
The NPC Standing Committee’s work report made no mention of plans to work on the Artificial Intelligence Law in 2024.
According to Hong Kong-based South China Morning Post, Chinese Ambassador to the UN ZHANG Jun (张军) discussed AI during a CPPCC panel discussion. Zhang reportedly said that China should promote AI regulation, pursue more “forward-looking” AI research, and noted that in the US “many things about AI are still controlled at the corporate level and research level.”
High level officials conducted site visits to AI labs around the Two Sessions. Premier LI Qiang (李强) visited Baidu and the Beijing Academy of AI (BAAI) on March 13, discussing the importance of AI in future economic development. The head of China’s macroeconomic planner, the National Development and Reform Commission (NDRC), visited Baidu, BAAI, and startups Baichuan Intelligence and Zhipu AI on March 2.
Implications: Overall, the Two Sessions sent a mixed signal on AI. Beyond the AI+ initiative, which appears to be more application-focused, there was not a strong signal on AI fundamental research or AI safety and governance. However, the site visits by Premier Li and the head of NDRC to major Chinese large model developers indicate that frontier AI development is still of great interest to high level officials. Separately, it is unclear how quickly the national AI Law – first mentioned in June 2023 – will be drafted given the omission in the 2024 NPC Standing Committee work report.
Standard on generative AI security finalized
Background: On February 29, a Chinese standards body known as Technical Committee 260 (TC260) published a document titled Basic security requirements for generative artificial intelligence service.4 This is the finalized version of a draft originally published in October 2023 and will likely be used to guide Chinese research entities in implementing the security assessment required in a 2023 regulation on generative AI.
Requirements of the standard: The document focuses on security from the perspective of corpus origin, corpus content, corpus watermarking, and model security. Key risks are outlined in Appendix A of the document, including content that violates core socialist values, discriminatory content, commercial violations, violations of individual rights, and lack of reliability or inaccuracy in more sensitive areas such as psychological counseling and critical information infrastructure. The document requires generative AI providers to conduct a security assessment either through a third party organization or by themselves. It sets out concrete quantitative tests to determine whether models are safe: for instance, testing at least 4,000 samples from the data corpus for compliance (requiring 96% compliance rate), or testing generated content from at least 1,000 questions against a “keywords” list of 10,000 words (requiring 90% compliance).
Implications: This standard, like most of China’s existing AI regulations, is geared primarily towards objectives of social stability and content moderation. Nevertheless, it is interesting to see the approach of stipulating specific quantitative thresholds that must be met. Other documents by government standards bodies, including a 2023 guide for ethics governance, have referenced the importance of values alignment.5 It remains to be seen if these standards will incorporate frontier AI risks such as biological misuse and loss of control into the security testing requirements in the future.
Beijing municipal government creates expert committee for AI
Background: On March 1, the Beijing municipal government announced the creation of an AI Strategic Advice Expert Committee. Representatives from numerous Beijing government departments attended, including the municipal Science & Technology Commission and Cyberspace Administration.
Committee composition and discussion: There are at least seven experts on the committee, including representatives from the Chinese Academy of Sciences, Tsinghua University, Peking University, Baidu, LM startup Zhipu AI, and startup incubator MiraclePlus. During the first meeting of the committee, topics of discussion included AGI development, large models, intelligent chips, AI safety and governance, and talent and ecology cultivation.
Implications: Given that Beijing is home to most of China’s large model companies, including some of the top companies (such as Baidu and Zhipu AI), its local policies are likely nationally influential. It is notable that the first meeting primarily discussed frontier AI, given the references to large models and AGI, and included safety and governance issues. Of the members of this committee, Professor ZENG Yi from the Chinese Academy of Sciences is the only one thus far to have publicly expressed major concerns about frontier AI risks.
Technical Safety Developments
Shanghai Jiao Tong University team creates benchmark for LLM agent safety
Background: On January 18, researchers from Shanghai Jiao Tong University published R-Judge: Benchmarking Safety Risk Awareness for LLM Agents. Leaders of the research team included Professor WANG Rui (王瑞) and Professor ZHANG Zhuosheng (张倬胜); Zhang also recently published a paper with international collaborators on AI agents in science.
R-Judge: The paper addresses the behavioral safety of LLM agents in different environments through a new benchmark called R-Judge. R-Judge contains “162 records of multi-turn agent interaction,” focusing primarily on 10 risk types. The risks are not focused on frontier AI safety considerations, and categories include physical health, computer security, illegal activities, and ethics & morality. They tested the safety of 9 LLMs, finding that most fail to identify safety risks. The authors state that this is the first benchmark they are aware of focused on risk awareness of LLMs for agent safety.
Tianjin University group explores aligning to human values across languages
Background: On February 28, a research team from Tianjin University published a preprint titled Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? The group was led by Professor XIONG Deyi (熊德意), director of the Tianjin University Natural Language Processing Laboratory (TJUNLP), whose surveys on LLM alignment and evaluation were previously covered in Issue #4.
The paper: The research group explored whether LLMs “encode concepts representing human values in multiple languages,” whether such concepts are transferable across languages, and how value alignment could be controlled across languages. To this end, the researchers extracted multilingual human value concept vectors from LLMs, used them to recognize concepts, compared them across languages, and sought to use them to control model behavior. The paper covers seven concepts of human values: morality, deontology, utilitarianism, fairness, truthfulness, toxicity, and harmfulness. They examined 16 languages and 3 LLM families. The researchers found that LLMs do encode representations of human values across various languages. They found that the transferability of these concepts varies based on the multilingual patterns of different models, and that aligning values can be successfully transferred across languages.
Groups involving Microsoft Research Asia published two papers on alignment
Background: Research groups involving Microsoft Research Asia (MSRA) published two preprints on alignment. The paper Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization, published on March 6, was co-written by Fudan University scholars including Professor GU Ning (顾宁). On March 7, the preprint On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models was released with a number of collaborators. MSRA Societal AI lead XIE Xing (谢幸) was involved in both papers, and we have covered his previous work in issues #2, #5, and #7.
Negating Negatives: This paper seeks to achieve alignment through “solely human-annotated negative samples,” which can still retain helpfulness while reducing harmfulness. This is similar to work on LLM unlearning. The researchers created a method called Distributional Dispreference Optimization (D2O) that “maximizes the discrepancy between the generated responses and the dispreferred ones to effectively eschew harmful information.” They find that this method outperforms others in reducing harmfulness while maintaining helpfulness.
On the Essence and Prospect: This paper surveys different approaches to value alignment, including analyzing the historical context of alignment, the formal mathematical definition of alignment (its “essence”), and inherent challenges. The authors categorize alignment methods into reinforcement learning, supervised fine-tuning, and in-context learning, noting pros and cons of each approach. They also note that two novel frontiers in the field are personal alignment and multimodal alignment.
Other relevant technical publications
Chinese University of Hong Kong and Tsinghua University, GuardT2I: Defending Text-to-Image Models from Adversarial Prompts, arXiv preprint, March 3, 2024.
Chinese Academy of Sciences, Tsinghua University, and RealAI, Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction, arXiv preprint, February 28, 2024.
Tsinghua University, Renmin University, WeChat AI, Tencent, Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment, arXiv preprint, February 29, 2024.
The University of Hong Kong, ImgTrojan: Jailbreaking Vision-Language Models with ONE Image, arXiv preprint, March 5, 2024.
Chinese University of Hong Kong and IBM Research, Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes, arXiv preprint, March 1, 2024.
Expert views on AI Risks
Chinese government think tank researcher writes on intersection of AI and biological risks
Background: On January 26, researcher ZHANG Ruiqing (张芮晴) published an article titled The integration of biological technology and AI creates new forms of biological security risks. Zhang works for the Development Research Center (DRC) Institute of International Technology and Economy.6 The DRC is a government-affiliated think tank directly subordinated to China’s cabinet, the State Council, making it one of the most influential tanks in China.
Bio and AI risks: Zhang argues that biosecurity risks can be increased by LLMs and also by biological design tools (BDTs). LLMs increase risk by enhancing knowledge accessibility and thus lowering the threshold for biological abuse for non-experts, through screening rapidly for toxic molecules and helping to identify specific pathways for bioweapons development. BDTs can increase the potential harm of biological misuse and may be able to generate dangerous substances that do not rely on existing controlled substances. Moving forward, Zhang calls for the international community to work together and control the potential risks of accidental or purposeful biological misuse.
Implications: This is one of the first in-depth analyses published by a Chinese expert on the intersection of AI and biosecurity risks. Concordia AI has previously analyzed technical papers involving Chinese researchers on preventing misuse of AI in science, and Chinese research entities have previously discussed AI and biosecurity in the context of global trends. This new article suggests that Chinese policy analysts are also becoming concerned with this domain, and DRC is particularly influential in Chinese domestic policy. AI’s impacts on biosecurity could be a fruitful area for international cooperation given the transnational threat it poses.
What else we’re reading
Gladstone AI, An Action Plan to increase the safety and security of advanced AI, February 2024.
Ben Garfinkel, Markus Anderljung, Lennart Heim, Robert Trager, Ben Clifford, Elizabeth Seger, Goals for the Second AI Safety Summit, Centre for the Governance of AI, March 4, 2024.
Paul Scharre and Vivek Chilukuri, What an American Approach to AI Regulation Should Look Like, Time, March 5, 2024.
Concordia AI’s Recent Work
Concordia AI Senior Program Manager Kwan Yee Ng is teaching part of an online course on AI ethics hosted by the National University of Singapore's Centre for Biomedical Ethics. Other instructors of the 14-part course include Prof. Walter Sinnott-Armstrong, Prof. Simon Chesterman, and Prof. Vincent Conitzer.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
Wang Yi is simultaneously a member of the Politburo (top 24 officials in the Communist Party of China), Director of the Office of the Central Foreign Affairs Commission, and Foreign Minister.
The NPC is a unicameral legislature in China, responsible for passing laws and personnel appointments. CPPCC is an advisory body that lacks legislative powers, sometimes analogized to the House of Lords in the UK.
The Government Work Report recapped work by the entire state-side apparatus in China from 2023 and set out the top priorities for work in 2024.
TC260’s Chinese name is 全国网络安全标准化技术委员会, and the Chinese name of the document is 生成式人工智能服务安全基本要求.
See Concordia AI’s State of AI Safety in China report, page 15.
In Chinese, 国务院发展研究中心国际技术经济研究所.