AI Safety in China #6
Government think tank on large model governance, Kai-Fu Lee supports greater attention to safety, new LLM safety research, and Tencent Research Institute event on value alignment
Key Takeaways
A government-backed think tank released a report on large model governance, with extensive discussion of frontier AI risks and recommendations for improving AI safety.
Renowned experts Kai-Fu Lee and Zhang Ya-Qin expressed support for frontier AI companies increasing the proportion of staff and/or computing resources devoted to safety and governance issues.
A Chinese research group used linguistically-inspired methods to probe LLM safety guardrails.
Concordia AI is co-hosting a sub-forum at the International AI Cooperation and Governance Forum this weekend in Hong Kong.
International AI Governance
Chinese government ministries discuss China-US AI dialogue
Background: Officials in the Ministry of Industry and Information Technology (MIIT) and Ministry of National Defense (MND) made statements on separate occasions regarding China-US dialogue on AI.
MIIT’s statement: On November 24, MIIT Science and Technology Department Deputy Director-General LIU Bochao (刘伯超) spoke at a salon held by the China Public Diplomacy Association (CPDA).1 Liu noted a desire to deepen AI technical exchange and cooperation between the US and China, as well as increase linkages between research entities and companies.
MND’s statement: On November 30, MND spokesperson Senior Colonel WU Qian (吴谦) answered one question at a press conference about reporting on a potential China-US agreement on lethal autonomous weapons (LAWs) before the Xi-Biden summit. Wu referenced President Xi’s announcement of the Global AI Governance Initiative and stated that China is willing to strengthen cooperation and dialogue in accordance with principles like “human-centered” and “AI for good” to avoid misuse or abuse of AI weapons systems and ensure they remain under human control. He did not directly address future China-US AI discussions or the reporting on a potential LAWs ban.
Implications: Few details have emerged thus far on the scope, participants, frequency, etc. of the China-US governmental dialogue on AI. Confirming such details will be critical to moving forward to actually holding these important talks.
Domestic AI Governance
Government think tank publishes report on large model governance
Background: The China Academy of Information and Communications Technology’s (CAICT) Policy and Economics Research Office, along with the Chinese Academy of Sciences (CAS) Institute of Computing Technology’s Key Laboratory for Intelligent Algorithms Safety, released a Large Model Governance Blue Paper Report in November.2 CAICT is a think tank overseen by MIIT, and CAS is a leading government-backed research institution.
Report contents: This lengthy report discusses trends in large model technology, risks from large models, core problems in large model governance, notable governance practices globally, China’s governance instruments, and policy recommendations. Information security and fake information are the risks which receive the most focus. However, the report also devotes substantial space to various frontier risks. It notes that “emergence” in large models could lead to loss of human control, large models becoming the dominant force on the earth, and catastrophic results. It also mentions robustness and security concerns with large models, including vulnerability to instruction attacks, backdoor attacks, and adversarial attacks. When discussing key ethical issues with large models, the report focuses on the issue of loss of human control, calling for pursuing AI value alignment through reinforcement learning from human feedback (RLHF), interpretability research, and adversarial testing.
Key recommendations: The report made five primary recommendations: ensure “agile” governance, differentiate governance based on application use cases, innovate governance tools, encourage companies to manage risks, and promote global cooperation on governance. Specific suggestions of note include creating a regulatory sandbox; developing risk tiers; constructing a national-level large model testing and verification platform that tests for adversarial security, backdoor security, and explainability; achieving consensus on common risks internationally; creating organs for assessing AI’s impacts and risks between governments; and working together with experts from other countries to explore methods for testing and evaluating large models for risks.
Implications: CAICT is one of the most influential Chinese think tanks on cyber and AI issues. This report, while broadly on large model governance, has more detailed and sophisticated discussions of frontier AI risks, particularly preventing loss of control, than previous reports by government-affiliated think tanks. In addition, the recommendations are promising. A national-level large model testing and verification platform in China could have similarities to the UK and US AI Safety Institutes and ensure greater safety of domestic models. Meanwhile, the suggestions around creating international consensus, establishing an international mechanism for assessing AI risks, and exploring large model testing methods are positive.
Two industry bodies create new AI safety and security governance expert committees
Background: On October 12, the Cyber Security Association of China (CSAC) announced the establishment of an AI safety and security governance expert committee. CSAC is an industry association supervised by the Cyberspace Administration of China. On November 26, the China Large Model Corpus Data Alliance (hereafter “Corpus Data Alliance”) also established a safety and security governance expert committee. The Corpus Data Alliance was founded in July 2023 by 10 government, media, and industry entities.3
CSAC’s committee: The expert committee contains 58 members and is led by the deputy director of the National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT/CC), with vice-chairs from the Beijing Academy of AI and Shanghai AI Lab.4 The committee is expected to work on topics including creating a high quality Chinese language corpus, model safety and security testing, and large model safety applications in vertical domains.
The Corpus Data Alliance’s committee: This committee was created by Shanghai AI Lab and People’s Daily Online, the online version of the Communist Party of China’s official newspaper. The committee will focus on promoting data security governance and privacy protection for large models.
Implications: Alongside similar actions by China’s Artificial Intelligence Industry Alliance, there appears to be a trend of establishing safety and governance committees in AI-related industry and government bodies. However, the focus will likely differ across institutions. CSAC’s committee may focus more on cybersecurity and system security rather than other AI safety topics, and the China Large Model Corpus Data Alliance appears more focused on data security and privacy.
Technical Safety Developments
Fudan group uses linguistics-inspired approach to probe model safety
Background: On November 1, a research group at Fudan University’s System Software and Security Lab released a platform named “JADE” to evaluate safety of large language models.5 The group is led by Fudan University School of Computer Science Dean YANG Min (杨珉). The lab itself researches a wide range of cybersecurity issues.
JADE: Inspired by Noam Chomsky’s theory of transformational-generative grammar, the researchers create a platform to automatically increase the syntactic complexity of unsafe questions to break through safety measures. The authors focus on four types of unsafe generative AI behaviors: crime, tort, bias, and core values. They create rules so that JADE “grows and transforms the parse tree of the given question until the target set of LLMs is broken.” This approach transforms seed questions with a violation rate of only around 20% into prompts that violate LLM safety over 70% of the time, testing against 18 Chinese and English-language LLMs. They argue that their result shows that linguistic mutations or variations can also break through model safety guardrails, similar to jailbreaking.
Implications: The novel approach of this preprint claims to demonstrate the limitations of static benchmarks and the vulnerability of LLMs to attack variants inspired by methods from linguistics. This suggests that existing safety gaurdrails in LLMs are still susceptible to attacks generated from methods inspired by existing NLP practices.
Other relevant technical publications
Harbin Institute of Technology and Huawei, A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, arXiv preprint, November 9, 2023.
Expert views on AI Risks
Kai-Fu Lee expresses support for devoting greater resources to AI safety research
Background: On November 28, tech media outlet 36Kr hosted a conversation with Kai-Fu Lee (李开复) and ZHANG Ya-Qin (张亚勤) chaired by 36Kr CEO FENG Dagang (冯大刚) during a conference. Kai-Fu Lee is an entrepreneur based in China, author of AI Superpowers, and founder of new AI startup 01.AI. Zhang Ya-Qin is an academician of the Chinese Academy of Engineering, director of the Tsinghua Institute for AI Industry Research, and former President of Baidu; translations of his previous remarks on frontier AI risks can be found here.
Discussion of AI safety: The discussion covered a wide range of issues, including open source versus closed source, new trends in large model development, and pursuit of AGI. Zhang Ya-Qin noted three primary risks with AI – false information, AI that has escaped control could be used by bad people, and risks in the application of AI in real-world financial, policy, etc. systems. Meanwhile, Kai-Fu Lee expressed optimism that AI’s development would bring far greater benefits than costs and stated that slowing down AI development is not feasible. However, he also noted that he supports encouraging use of a portion of computing power for ensuring controllability of technology to prevent catastrophes and supports allocating one-fifth or more of researchers to study this problem. He acknowledged that governance and controllability are just as important as increasing model capability. Zhang Ya-Qin followed up on this statement by calling for frontier AI companies to spend at least 10% of their funds on researching AI risks, leveraging collaboration between technologists and policymakers.
Implications: Increasing funding and research devoted to AI safety issues is one of the most straightforward ways to alleviate racing dynamics around AI development and promote international cooperation. Zhang Ya-Qin had previously expressed support for such measures in the Ditchley dialogue and “Managing AI Risks in an Era of Rapid Progress” paper. However, these are the strongest statements Kai-Fu Lee has made publicly supporting AI safety. If Kai-Fu Lee’s startup increases its proportion of resources devoted to safety issues, that would further indicate his prioritization of this issue. This discussion shows that proposals to devote a minimum percentage of AI R&D computing or personnel resources to safety issues are becoming more widely discussed in China.
Group of rising scholars holds roundtable on value alignment
Background: Tencent Research Institute published an article summarizing key findings from a roundtable discussion it hosted in October 2023 regarding AI value alignment. The article summarizes the arguments of six rising scholars: Fudan University Professor XIAO Yanghua (肖仰华), Shanghai Jiaotong University Associate Professor ZHANG Quanshi (张拳石), University of International Business and Economics Associate Professor XU Ke (许可), Concordia AI Senior Governance Lead FANG Liang (方亮), Ant Group expert WANG Binghao (王炳浩), and Tencent YouTu Lab Senior Researcher LI Ke (李珂).
Roundtable discussion: The roundtable included a number of interesting points and discussion. Xiao noted that he is more concerned that AI will slowly lead to the regression of human intelligence, rather than that AI will directly cause human extinction. Zhang discussed his research in large model interpretability, suggesting that it is better not to focus on local loss functions and structures, but directly use equivalent modeling from the input and output. Xu associated alignment with attributing responsibility for misuse, noting that while there are substantial differences on values around the world, it could be possible to establish a global consensus on specific long-term values. Fang discussed different approaches to alignment research, the Alignment Research Center’s responsible scaling policy proposal, and suggested increasing funding for AI safety and value alignment work. Wang called for consensus building and trust building and noted the difficulty of defining values. Li stated that value alignment approaches can find inspiration in human education and hopes that large models can promote greater educational equity.
Implications: This discussion shows that discussions around large model risks and governance are not concentrated only among senior researchers; many rising scholars with both technical and policy backgrounds are concerned about AI risks and thinking about ways to improve model safety.
Concordia AI’s Recent Work
Co-hosting sub-forum at the International AI Cooperation and Governance Forum 2023
Concordia AI is co-hosting and chairing a sub-forum on “Frontier AI Safety and Governance” at a conference held by Hong Kong University of Science and Technology (HKUST) and Tsinghua University on December 9. The session will feature notable experts from Tsinghua University, xAI, Anthropic, University of Cambridge, HKUST, Microsoft Research Asia, the Future Society, University of Hong Kong, Center for Strategic and International Studies, Singapore’s Infocomm Media Development Authority, and East China University of Political Science and Law.
You can find the agenda here and register to watch online here.
Concordia AI’s job openings
In case you missed it, Concordia AI has positions open for both full- and part-time positions. Applications close on December 8, so apply soon!
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
CPDA (中国公共外交协会) is closely associated with the Chinese Ministry of Foreign Affairs (MOFA), given that its leadership is composed of former high-level MOFA diplomats.
See Jeff Ding’s ChinAI newsletter for a partial English translation. The CAICT entity publishing the report is 中国信息通信研究院政策与经济研究所, and the CAS entity is 中国科学院计算技术研究所智能算法安全重点实验室.
CSAC is 中国网络空间安全协会 and the China Large Model Corpus Data Alliance is 中国大模型语料数据联盟.
CNCERT/CC is 国家互联网应急中心.
The lab is 复旦大学系统软件与安全实验室.