AI Safety in China #4
China at the UK Global AI Safety Summit, leading Chinese and Western AI scientists propose AI risk mitigation strategy, domestic ethics reviews for AI, and surveys on AI alignment and evaluations
Key Takeaways
A Chinese Vice Minister of Science and Technology attended the Global AI Safety Summit in the UK where multiple countries, including China, signed the Bletchley Declaration. Several non-governmental Chinese institutions also attended, including Concordia AI.
A new AI safety dialogue between Chinese and Western scientists, as well as a joint article co-authored by a group of prominent Chinese and Western scientists, highlight agreement on frontier AI risks and policy recommendations to mitigate those risks.
China’s Science and Technology Ethics Review process has been finalized, imposing new requirements for reviews during the R&D process for certain AI applications.
Chinese research groups have published three new comprehensive surveys on AI/LLM alignment and evaluations.
(Note: This issue covers developments since September 20 and was postponed due to the release of our State of AI Safety in China report on October 23. Regular biweekly updates will resume following this issue.)
International AI Governance
China attends Global AI Safety Summit in the UK
Background: The UK held the Global AI Safety Summit on November 1 and 2, resulting in the publication of the Bletchley Declaration, which resolves “to support an internationally inclusive network of scientific research on frontier AI safety” and “to sustain an inclusive global dialogue” on AI. China was one of the 28 countries, as well as the EU, which signed the declaration. Chinese private sector actors also attended the summit.
Chinese government representative: Chinese Vice Minister of Science and Technology WU Zhaohui (吴朝晖) attended the summit and gave remarks at the opening plenary. Referencing China’s Global AI Governance Initiative, he called for strengthening management of technological risks, ensuring AI remains under human control, as well as increasing the representation and voices of developing countries in AI governance. He additionally stated: “China is willing to enhance our dialogue and communication in AI safety with all sides, contributing to an international mechanism with broad participation in governance framework.”
Non-governmental representatives: Non-governmental Chinese attendees at the summit included representatives from the Chinese Academy of Sciences, Alibaba, Tencent, and Concordia AI’s CEO, Brian Tse. Concordia AI will share additional insights in a follow-up article.
Implications: Active Chinese participation in the UK Global AI Safety Summit, and the Chinese government signing on to the Bletchley Declaration, demonstrates appetite in China to engage globally on AI issues. After the summit, UK Prime Minister Sunak said in an interview with Elon Musk that he was “pleased” about China’s participation, and noted positively that China had also “ended up signing the same communique that everyone else did, which is a good start.”
New AI safety dialogue reveals consensus between Chinese and Western AI Scientists
Background: A group of scientists from the US, China, UK, Europe, and Canada held the first “International Dialogue for AI Safety” at Ditchley Park, UK. The meeting was “convened by Turing Award winners Yoshua Bengio and Andrew Yao, UC Berkeley professor Stuart Russell, OBE, and founding Dean of the Tsinghua Institute for AI Industry Research Ya-Qin Zhang.”
Joint Statement: The event produced a substantive joint statement which discussed “near-term risks from malicious actors misusing frontier AI systems,” such as misinformation and helping terrorists develop weapons of mass destruction, as well as “a serious risk that future AI systems may escape human control altogether.” It argued that, “taken together, we believe AI may pose an existential risk to humanity in the coming decades.” The statement provided specific recommendations:
National governments should mandate registration of models above a certain capability threshold, require third-party audits of information security and model safety, and ensure that developers share risk assessments.
Governments should cooperate to define and enforce clear red lines that, if crossed, would result in termination of AI systems.
Leading AI developers and governments should devote a minimum of one third of their AI R&D funding to AI safety and/or governance research.
Implications: The recommendation on model registrations and audits are notable, since while China already has similar policies on algorithm registrations and security reviews, they do not currently focus on frontier AI safety. The call for devoting a set percentage of AI R&D funding to safety and governance issues is a powerful proposal, and the strong background of the Chinese co-convenors will strengthen the appeal of the proposals in China.
China publishes comprehensive document on global AI governance
Background: President Xi Jinping announced the new Global AI Governance Initiative (全球人工智能治理倡议) at the opening ceremony of the Third Belt and Road Forum for International Cooperation on October 18. The full document sets out China’s core positions on values for AI development and areas for international cooperation on AI.
Key Positions: The initiative covers broad ground by discussing the role of AI in sustainable development, respecting other countries’ national sovereignty when it comes to AI, ensuring that all countries have equal rights to develop and use AI, strengthening privacy, and preventing bias and discrimination. On issues relating to safety of frontier models, the document calls for working “together to prevent and fight against the misuse and malicious use of AI technologies by terrorists, extreme forces, and transnational organized criminal groups.” It supports creating a “testing and assessment system based on AI risk levels” and calls for R&D entities to “ensure that AI always remains under human control” and improve AI explainability. It notes the importance of building a science and technology ethics review system. It supports increasing representation of developing countries in global AI governance and bridging AI gaps. It also expressed support for “discussions within the United Nations framework to establish an international institution to govern AI, and to coordinate efforts to address major issues concerning international AI development, security, and governance.”1
Implications: This is the most comprehensive document China has issued thus far on international AI governance. Its publication in a high-profile venue, announced by President Xi, indicates its prominent policy significance. The reference to terrorism suggests that China may be interested in global information sharing mechanisms on potential uses of AI by terrorists. While it is unclear what sort of international institution on AI might be supported by China, the document makes clear that discussions should occur under the auspices of the UN, and China would likely attempt to ensure strong presence of developing countries in such a body.
Domestic AI Governance
Chinese government finalizes Science and Technology Ethics Review process
Background: In October, the Ministry of Science and Technology (MOST) and nine other government departments released the finalized Science and Technology (S&T) Ethics Review Plan (trial). The document was approved by the Central Science and Technology Commission (中央科技委员会), a higher-level body directly under the Communist Party of China (CPC) Central Committee, and will come into effect on December 1.
Ethics review plan: The document mandates that institutions conducting AI research in “sensitive fields of ethics” establish an S&T ethics review committee. The requirements for the committee are detailed, requiring at least seven members and suggesting different expertises and geographic distributions. Ethics reviews must include, among other things, documentation of the R&D activity; its purpose, significance, and necessity; ethics risks and mitigation strategies; the research experience of participants; and the status of their participation in scientific and ethical training. Typically, the review committee conducts ethics reviews; however, for applications on MOST's “expert review” list, local or ministerial authorities must arrange an independent expert review prior to technology development. AI applications on the “expert review” list include, at present: human-machine integrations that have a strong effect on human emotions and health, algorithms that have the ability to mobilize public opinion and guide social consciousness, and autonomous decision-making systems that are highly autonomous and pose risks to safety or personal health. While the exact meaning of some of these categories are unclear, the last one in particular could include a broad category of AI systems, and requiring reviews during the R&D phase is unusual among other Chinese regulations.2
Implications: The S&T ethics review system may develop into an important tool for AI safety and governance in China. The plan is highly detailed, which will aid enforcement and suggests it is the result of substantial deliberation. The current ethics review plan touches on a number of use cases relevant to AI safety, but its significance will depend on how stringently it is enforced.
Chinese AI industry association undertakes "Deep Alignment" project
Background: China’s Artificial Intelligence Industry Alliance (AIIA) (中国人工智能产业发展联盟) is a prominent industry association founded in 2017 under the direction of four government departments (NDRC, MOST, MIIT, and CAC), led by institutions including MIIT’s CAICT research institution.3 It boasts a number of leading scientific research institutions, companies, and SOEs among its Chair and Vice-Chairs, signifying prominence, but other associations, such as the more academically-oriented China Association for Artificial Intelligence (中国人工智能协会), are also notable.
AI alignment and security projects: On October 8, AIIA announced a project with CAICT called “Deep Alignment,” noting that aligning AI to human values has become “increasingly urgent.”4 As part of this venture, AIIA announced its intent to recruit its first batch of partners and disclosed plans to release a research report titled “AI Value Alignment Operationalization Guide” (人工智能价值对齐操作指南). This endeavor also seeks to promote development of technical tools to evaluate model alignment. On September 27, AIIA had also announced that CAICT has entrusted it to create a “safety/security governance committee.” This body is tasked with channeling industry inputs into policy processes, supporting the development of AI safety/security platforms, promoting learning from the safety supervision industry, and spearheading discussions on AI safety/security governance.5
Implications: These two projects show growing interest by AIIA in AI safety and security issues. While some leading US AI companies have made commitments around red teaming, bounty programs for AI vulnerabilities, and watermarking, as well as forming the Frontier Model Forum, Chinese companies have preferred to issue more general statements on AI ethics and governance. AIIA could become a venue where Chinese companies consider more industry-wide action to pursue common standards or commitments on AI safety and security.
Draft standard outlines concrete requirements for generative AI security
Background: China’s National Information Security Standardization Technical Committee (TC260) released a draft standard on “Basic security requirements for generative artificial intelligence service” on October 11. The standard is intended to guide Chinese generative AI providers or testing institutions in conducting security reviews. These security reviews would have to be conducted as part of the materials that generative AI providers must use to register with regulatory departments.
Key content: The guide notes 31 specific security items in Appendix A, split across five different categories: content that violates socialist core values, discriminatory content, violating commercial rules, violating legitimate rights and interests of others, and security requirements for specific service types. It would impose criteria to be used in evaluating security of training data content as well as generated data content, for instance requiring manual sampling of 4000 items from the training data (with a pass rate of at least 96%) and manually testing the generated content on 1000 test questions (with a pass rate of at least 90%).
Implications: While this draft standard currently focuses on issues of content control, discrimination, privacy, etc., rather than frontier AI safety issues, the proposing of specific testing methods and criteria for security reviews show that the Chinese regulatory apparatus is actively thinking through how to evaluate AI model security. These principles could later be applied to frontier model risks if policymakers deem it necessary. Although this is a draft that could later be changed substantially, the criteria under consideration reflect policymaker concerns.
Technical Safety Developments
Senior Chinese AI researchers release survey on AI alignment
On October 30, a research team led by a group of senior Chinese AI researchers released a preprint titled AI Alignment: A Comprehensive Survey. Notable co-authors include Peng Cheng Laboratory Director GAO Wen (高文), Hong Kong University of Science and Technology Provost GUO Yike (郭毅可), Beijing Institute for General Artificial Intelligence Dean ZHU Song-Chun (朱松纯), and PKU PAIR Lab director YANG Yaodong (杨耀东). Concordia AI’s Brian Tse, Yawen Duan, and Kwan Yee Ng also participated as co-authors. The survey outlines the landscape of current alignment research and categorizes it into forward alignment, such as learning from feedback and learning under distribution shift, and backwards alignment, such as assurance techniques and governance practices. The authors have also created a website at alignmentsurvey.com with tutorials and other resources on AI alignment.
Tianjin University releases LLM alignment and evaluation surveys
Tianjin University research teams led by XIONG Deyi (熊德意), director of the Tianjin University Natural Language Processing Laboratory (TJUNLP), released Large Language Model Alignment: A Survey on September 26 and Evaluating Large Language Models: A Comprehensive Survey on October 30. The first paper categorized existing large language model (LLM) alignment methods into “inner” and “outer” alignment, discussed methods such as scalable oversight, mechanistic interpretability, and references robustness to adversarial attacks. The second paper categorized LLM evaluations into knowledge and capability evaluations, alignment evaluations, and safety evaluations. It provided examples of evaluations for topics like alignment to ethics and morality, robustness, and risky behaviors like power seeking and self-awareness.
Investigating adversarial robustness of Multimodal Large Language Models (MLLMs)
A preprint published on September 21 tests the adversarial robustness of Multimodal Large Language Models (MLLMs) to vision inputs. The study was led by ZHU Jun (朱军), a professor at Tsinghua University and co-founder of AI security startup RealAI. The researchers generated adversarial examples from white-box surrogate vision encoders or MLLMs, and transferred such adversarial attacks to Bard, Bing Chat, ERNIE Bot, and GPT-4V, with success rates of 22%, 26%, 86%, and 45% respectively. The researchers also identified two defense mechanisms used by Bard and developed methods to evade them.
Research on removing safety features from open-sourced models
A research team from Shanghai AI Laboratory, Fudan University, and University of California, Santa Barbara published a preprint on safety of open-source LLMs. The research team was led by LIN Dahua (林達華), a professor in the Chinese University of Hong Kong Department of Information Engineering affiliated with Shanghai AI Laboratory. The researchers found that by tuning such models on just 100 malicious English-language examples with 1 GPU hour, LLMs can be trained to generate harmful content and evade existing safety measures. However, the models largely remained helpful and useful. They tested this method across open-source models released by LLaMa-2, Falcon, InternLM, BaiChuan2, and Vicuna.
Huawei and HKUST pursue LLM alignment using mistake analysis
A research team by Huawei Noah’s Ark Lab and Hong Kong University of Science and Technology (HKUST) published a preprint on aligning LLMs via mistake analysis. The study was conducted by a research group led by LIU Qun (刘群), Chief Scientist of Speech and Language Computing at Huawei Noah’s Ark Lab. The researchers sought to align LLMs using mistakes but without allowing them to be affected by toxic inputs. They exposed LLMs to flawed outputs, used natural language analysis to help LLMs analyze the harmful content, and helped LLMs understand what content should and should not be generated. They found that this method outperforms other common alignment methods, such as supervised fine-tuning and reinforcement learning from human feedback (RLHF) by testing on Alpaca-7B.
Survey on LLM-based agents discusses AI safety and threats to the human race
A research team from Fudan University Natural Language Processing Group published a preprint survey on LLM-based agents on September 14. The survey was led by several of the group’s senior researchers of NLP robustness, including QIU Xipeng (邱锡鹏) and HUANG Xuanjing (黄萱菁). The survey covers many aspects of LLM-based agents, including one section on “Security, Trustworthiness and Other Potential Risks of LLM-based agents.” That section of the survey addresses adversarial robustness, trustworthiness, as well as the potential of LLM-based agents for misuse in cybersecurity, fraud, fake information, or terrorism. The survey additionally notes that there is a possibility that humans will not be able to “reliably control” AI agents, and “if these agents advance to a level of intelligence surpassing human capabilities and develop ambitions, they could potentially attempt to seize control of the world.” Therefore, the researchers believe that it is necessary to “comprehensively comprehend the operational mechanisms of these potent LLM-based agents before their development.”
Expert views on AI Risks
Prominent Chinese experts co-author article on AI risks
Background: A group of 24 important thinkers on AI published an article titled “Managing AI Risks in an Era of Rapid Progress.” Four notable Chinese thinkers were listed as co-authors: Turing Award winner and Dean of the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University Andrew Yao (姚期智), Dean of the Tsinghua Institute for AI Industry Research ZHANG Ya-Qin (张亚勤), Dean of Schwarzman College and Dean of Institute for AI International Governance of Tsinghua University (I-AIIG) XUE Lan (薛澜), and Dean of the Political Science Institute at East China University of Political Science and Law GAO Qiqi (高奇琦). Professors Xue Lan and Gao Qiqi are also on MOST’s National New Generation AI Governance Expert Committee.
Key points: The article highlights risks of AI, especially the chance that future autonomous systems could adopt harmful objectives, deceive humans, or gain societal empowerment, leading to a potential irreversible loss of human control over these systems. This could lead to “large-scale loss of life and the biosphere, and the marginalization or even extinction of humanity.” Therefore, the article calls for both addressing ongoing harms, such as misinformation and discrimination, as well as anticipating emerging risks. The article recommends that major tech companies and public funders spend at least one third of their AI R&D budget on safety and ethics issues. It also calls for national governments to require model registration for frontier AI systems and evaluate them for dangerous capabilities.
Concordia AI’s Recent Work
Concordia AI CEO Brian Tse was invited to moderate a subforum on AI-generated content at the Boao Forum for Asia’s Global Economic Development and Security Forum on October 29. At the forum, Concordia AI released a Chinese language report on Frontier Large Model Risks, Safety and Governance. The forum was presided over by President of the China Institute for Socio-Legal Studies at Shanghai Jiaotong University JI Weidong (季卫东).
What else we’re reading:
Henry A. Kissinger and Graham Allison, The Path to AI Arms Control: America and China Must Work Together to Avert Catastrophe, Foreign Affairs, October 13, 2023.
LU Chuanying (鲁传颍) et al., International Rules of Artificial Intelligence: Trends, Domains and China's Role, Shanghai Institutes for International Studies, Center for International Strategy and Security at Tsinghua University, and Peking University Internet Development Research Institution, October 2023.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
This section draws upon Concordia AI’s State of AI Safety in China report, pages 24-33.
This section draws upon Concordia AI’s State of AI Safety in China report, pages 15-17.
NDRC is the National Development and Reform Commission, China’s economic planning agency. MIIT is the Ministry of Industry and Information Technology, the regulator of IT such as telecoms and mobile apps. The CAC is the Cyberspace Administration of China, responsible for content control and data security oversight. CAICT is the China Academy of Information and Communications Technology, a think tank overseen by MIIT.
“Deep Alignment” is the English name provided by AIIA; the direct translation of the Chinese name is “AI Value Alignment Partnership Plan” (人工智能价值对齐伙伴计划).
This section draws upon Concordia AI’s State of AI Safety in China report, pages 63-66.