AI Safety in China #9
Chinese premier on AI at Davos, government think tank on frontier risks, research on alignment of AI agents, and AGI speech at Chinese political consultative body
Key Takeaways
China’s premier discussed the importance of AI security and ethics at Davos, including highlighting Shanghai’s AI ethics expert committee and the World AI Conference.
A government think tank published a report that discussed international cooperation on frontier AI risks, referencing concerns about CBRN and AI, use of AI in biological synthesis, and AI in critical infrastructure.
Chinese research groups published new preprints on alignment of LLM-based agents and defending against backdoor attacks.
BIGAI director ZHU Songchun (朱松纯) gave a speech to a Chinese political consultative body on AGI development and risks at the invitation of the former Minister of Science and Technology.
International AI Governance
Chinese Premier discusses AI governance at Davos
Background: Chinese Premier LI Qiang (李强) gave a speech at the World Economic Forum in Davos on January 16. Afterwards, he gave a lengthy reply to a question about “what role will China play” with regard to global governance of AI.
Li’s remarks on AI: Premier Li noted the potential of AI to bring progress to human civilization, but also risks to security and ethics. He stated that humans must have control over machines, with a red line in AI development that must be observed by all and never crossed. At the same time, AI development should be “inclusive and beneficial to all,” benefiting the “overall majority of mankind,” and prioritizing the interests of developing countries. Premier Li referenced Shanghai’s World AI Conference, also highlighting that Shanghai had created expert committees for “AI strategic advice” and “AI ethics.”
Implications: The relevant Shanghai expert committees were established by 2019 and 2022, respectively, so these are not new developments, and Premier Li did not reveal any new policies. Premier Li’s references to WAIC and Shanghai’s ethics committees are likely related to his prior leadership of the city from 2017 to 2022. He may be indicating that he had a special interest in the city’s AI initiatives during that time, but this remains unclear.
Government think tank discusses frontier risks in paper on international governance
Background: In December 2023, the China Academy of Information and Communications Technology (CAICT), a think tank overseen by the Ministry of Industry and Information Technology (MIIT), published a white paper on global digital governance.1 CAICT also wrote a report on large model governance in November 2023 and takes part in AI safety-related projects conducted by China’s Artificial Intelligence Industry Alliance (AIIA).
Discussion of AI: The paper discusses a number of risks from AI. It notes that due to hallucinations and “emergence” in large AI models, there are governance uncertainties and the risk of loss of control of AI. It also discusses risk of misuse of AI technology and dangers of lethal autonomous weapons (LAWs) to world peace. The article cites the United Nations High Level Advisory Board on AI’s interim report, Anthropic, and others in discussing the interaction of CBRN (化生放核; Chemical, Biological, Radiological and Nuclear) risks with AI, risks of AI with critical infrastructure, and the possibility that human values and knowledge will be replaced over a long period of human-machine interaction. These are some of the first such discussions in Chinese think tank circles, though largely referencing non-Chinese sources. The paper also noted that China-US dialogue on AI will help advance global AI governance. In discussing notable governance tools worldwide, it referenced digital watermarking, red-teaming, and taking a cautious approach to open-sourcing models. In a final section of recommendations, the report calls for discussions in various UN agencies and multilateral organizations on restraining military applications of AI and use of AI to generate false information.
Implications: CAICT is an influential voice given its government background. This white paper and the previous report on large model governance both include in-depth discussion of frontier risks that have not yet been discussed as intensively in other Chinese think tank or industry venues. In particular, the discussion of risks at the intersection of AI and CBRN is a sign that this issue starting to gain some prominence in China, and it could be one of the most promising topics for international cooperation given the transnational nature of the threat.
Domestic AI Governance
China takes steps to develop further standards on AI
Background: On January 18, MIIT published the National AI Industry Comprehensive Standardization System Construction Guide (draft for comments). The document outlines objectives for AI industry standardization in China as well as a categorization of standards into different types. In addition, on December 28, 2023, the Standardization Administration of China (SAC) announced that it was soliciting organizations to provide input on drafts of seven AI-related standards.
MIIT’s new guide: MIIT’s guide constitutes its implementation of an AI standardization construction document published by five government agencies in 2020. The document lists goals for AI industry standards, calling for China to create 50 new national standards and industry standards as well as participate in more than 20 international standards on AI by 2026. The document divides the standards system into six categories, including one category called “safety/security and governance.” Within this last category, the section on safety/security largely mirrors language in the 2020 document, calling for model safety/security, data and algorithm security, and systemic security, among others. However, governance is a new category and makes new references to robustness, reliability, traceability, and explainability in ethics review assessments.
New standards drafting: Among the seven new AI standards under development by SAC, one standard is on “Risk Management Capability Assessment” and one is named “Pretrained Model Part 2: Testing Indicators and Methods.” These standards are still soliciting drafting organizations, so the actual standards will likely take 1-2 years to develop and publish.
Implications: Both of these developments highlight that China’s standards system is working to create new standards around AI safety and governance. Since most such standards are still under development, it will be important to see what safety and governance concerns eventually are addressed. Given the technical nature of standards, new standards on AI safety could offer opportunities for mutual learning between China and other countries.
New local government policy on the AI industry from Zhejiang
Background: On January 10, the Zhejiang provincial government released a policy on AI industry development. This joins similar policies by Shenzhen and Shanghai from the fall of 2022. While it does not contain “large models” or “artificial general intelligence (AGI)” in its title, like 2023 policies from Beijing, Shanghai, Guangdong, and Anhui, it still frequently mentions those keywords in the text, suggesting a high degree of attention on frontier models.
Policy focuses: Most of the document is focused on improving development of advanced AI, with the goals of having more than 3000 AI companies, creating a top-tier domestic AGI development environment, and having an AI industry on the scale of 1 trillion RMB by 2027. There are also clauses relevant to AI safety, especially clause 6.4, which is focused on strengthening safety/security and ethics governance. That section calls for creating an AI supervision and governance system and emphasizes the role of the provincial science and technology (S&T) ethics committee. It notes that the S&T ethics committee should strengthen research on ethics and safety norms and social governance practices, as well as conduct security reviews and ethics reviews in important domains. Separately, the policy calls for exploring creation of a negative list and registration system for AI, also mentioned in the Shanghai policy from 2022.
Implications: It remains unclear which domestic governance policies for frontier AI risks will receive the most focus from the central government. This policy focuses on S&T ethics and a negative list, while some other recent policies focus more on alignment research, as well as testing and evaluations. As an important AI industry hub and home to Alibaba, Zhejiang will be another important example to follow as China explores domestic AI governance mechanisms for its upcoming national AI law.
Technical Safety Developments
Research on LLM agent alignment from Fudan
Background: On January 9, a research group from Fudan University published a preprint titled “Agent Alignment in Evolving Social Norms.” The group was led by QIU Xipeng (邱锡鹏), a professor at Fudan’s School of Computer Science and part of the Fudan University Natural Language Processing Group. This newsletter has also covered previous research by Dr. Qiu relating to AI safety.
Agent alignment research: The research group examined alignment of LLM-based agents, arguing that current methods do not sufficiently account for feedback that agents receive from the environment. They believe that just aligning the LLM is too static and does not account for evolving social norms. Therefore, they “reframe the agent alignment issue as a survival-of-the-fittest behavior in a multi-agent society.” They created a dynamic virtual environment, allowed agents to interact, and created a social observer to assess alignment of agents to the prevailing social norms. The agents more aligned with prevailing social norms were allowed to reproduce, creating a more-aligned set of agents that also evolved with social norms.
Implications: This paper grapples with the difficulty of aligning agentic AI systems due to external feedback from the environment as well as changing social norms, demonstrating Chinese research efforts on the complex problem of agent alignment.
Research group explores backdoor attack and defense
Background: On January 11, a research group including Nanjing University, the Chinese Academy of Sciences Institute of Automation, and University of Science and Technology of China published a preprint on backdoor attack and defense on model adaptation. The group was led by TAN Tieniu (谭铁牛), a leading Chinese computer scientist, academician of the Chinese Academy of Sciences, and the current Party Secretary of Nanjing University.
Research content: The authors explored security risks in model adaptation approaches, a paradigm that only uses pre-trained source models. The research is focused on the risk of backdoor attack from poisoned, unlabeled data, which the authors find still poses substantial risks. The researchers therefore developed a method called MIXADAPT to defend against backdoor threats, which “eliminates the mapping between the backdoor trigger and the target class by mixing semantically irrelevant areas among target samples.” The authors claim that this is the first investigation of unsupervised backdoor attacks on adaptation tasks and find that their defense method is effective.
Implications: In 2023, Dr. Tan also participated in a paper at ICCV that was on security of model adaptation. It is not clear how directly involved he is with these papers, but if he is interested in AI security issues, that could foster further work on the subject given his seniority and position in Nanjing University leadership.
Zhongguancun lab, Tsinghua, and others assess LLM risks
Background: On January 11, a research group from Zhongguancun Laboratory, Tsinghua University, the Chinese Academy of Sciences, and Ant Group published a preprint titled “Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems.” The group was led by professors LI Qi (李琦) and XU Ke (徐恪), both at Tsinghua and Zhongguancun lab. Zhongguancun Lab is a new national lab created in 2021 and led by academician of the Chinese Academy of Engineering and Tsinghua professor WU Jianping (吴建平). The lab is focused on the internet and information domain, and conducts strategic, forward-looking, and foundational research, as well as cooperating with institutions inside and outside of China.
LLM risks: The researchers create a taxonomy of AI risks using a module-oriented approach, dividing up LLM systems into four modules – “an input module for receiving prompts, a language model trained on vast datasets, a toolchain module for development and deployment, and an output module for exporting LLM-generated content.” In total, the researchers focus on 12 specific risks, differentiated further into 44 sub-categories. The risks most relevant to frontier AI safety include adversarial prompts (e.g. jailbreaks, goal hijacking) at the input module level, model attacks (e.g. poisoning attacks, inference attacks, novel attacks) at the language model module level, challenges with external tools (e.g. exploiting web APIs for attacks) at the toolchain module level, and unhelpful uses (e.g. cyberattacks, software vulnerabilities) at the output module level. The authors also survey mitigation strategies such as watermarking, defenses against different model attacks, and detection of malicious prompts.
Implications: This paper mostly focuses on LLM risks relevant for current applications, such as factuality and toxicity of content, data leakage, and security against external attacks. There is less focus on aligning models with human values, explaining the mechanistic internal behaviors of models, and red-teaming or external auditing of models (apart from for bias). While this paper offers an interesting categorization of risks and runs through the different phases of the model chain, additional work would be helpful for categorizing risks from frontier models. The recent emergence of Zhongguancun Lab as a national laboratory in this space adds another burgeoning AI player that has expressed interest in safety issues.
Expert views on AI Risks
BIGAI director gives speech on AGI to Chinese political body
Background: Peking University (PKU) Chair Professor, Director of the PKU Institute for Artificial Intelligence, and director of the Beijing Institute for General Artificial Intelligence (BIGAI) ZHU Songchun (朱松纯) gave a speech discussing AGI at an event hosted by the Chinese People's Political Consultative Conference (CPPCC). CPPCC provides political advice to the Chinese government and includes a number of intellectuals as members, but it does not have direct policy making power.2 Professor Zhu is a CPPCC member and favors a “small data for big tasks” paradigm for AI research. Several Vice-Chairs of CPPCC attended his talk, as did former Minister of Science and Technology WANG Zhigang (王志刚), who is now a member of CPPCC.
Discussion of AGI: Topics of discussion during the talk included “what is AGI,” “what are the paths to realizing AGI,” “is there a risk of loss of control of AI,” and “how can China gain competitive advantages in AI development.” According to the readout, safety/security was discussed frequently. Dr. Zhu argued that much of the information generated by GPT4 is untrustworthy, and it lacks an understanding of human intentions, which could lead to challenges for societal safety and ethics. He stated that he hopes to give AGI a value system and cognitive structure, creating a “heart” in the machine, which is necessary to gain credibility with humans. He believes that AGI should be values-driven rather than data-driven, based on the values of traditional Chinese philosophy, and aligned with human values and norms.
Implications: Zhu’s speech is part of a trend of increasing discussion of AGI development and risks in government venues by expert scientists. This speech and Zhu’s previous research on topics related to alignment suggest sympathy with concerns around frontier AI risks. Further discussion of the risks of AI by prominent scientists in government venues will help the scientific consensus on AI risks expand to become an international political consensus.
Concordia AI’s Recent Work
Concordia AI published our 2023 Annual Review last week. You can reference it to learn about what we were up to last year!
We released a draft Chinese report for expert feedback titled “Best Practices for Frontier AI Safety: Research and Development Practices and Policy Construction Guide for Chinese Institutions” at the Frontier AI Safety and Governance during the International AI Cooperation and Governance Forum last December. We just published the finalized version of the 70+ page report on January 17, incorporating those expert views.
On January 15, we published the “Safety and Global Governance of Generative Al” report in English and Chinese with 29 essays from over 40 policymakers, industry practitioners, and experts in and outside China. The essays analyzed generative AI risks and benefits from the perspective of global governance, developing countries, engineering, and companies. We were commissioned by the Shenzhen Association for Science and Technology and World Federation of Engineering Organizations - Committee on Engineering for Innovative Technologies (WFEO-CEIT) to be the Chief Editor of the report.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
CAICT’s Chinese name is 中国信息通信研究院.