AI Safety in China #17
Proposal for global AI capacity building, China-US dialogue updates, new AI safety governance framework, paper on LLM risks in science
Key Takeaways
China announced an AI capacity building project directed at Global South countries at the UN Summit of the Future.
The Chinese and US governments indicated that a second round of intergovernmental dialogue on AI is likely after the US national security advisor’s trip to China.
A Chinese standards body issued China’s first AI Safety Governance Framework with substantial treatment of frontier AI risks.
Recent Chinese technical AI safety papers include work on “weak-to-strong deception,” benchmarking LLM risks in science, and assessing which layers at the parameter level are most important for AI safety.
A Chinese academician and former cybersecurity official spoke on the need for further technical AI safety research.
Note: This newsletter edition was delayed due to our publication of an in-depth analysis of China's AI Safety Evaluations Ecosystem. The summaries in this newsletter are shorter than usual to accommodate the large number of updates since the last edition, and we plan to resume a regular publication schedule for the newsletter.
International AI Governance
China announces AI capacity building project at the UN
Chinese Politburo Member and Foreign Minister WANG Yi (王毅) gave speeches at the UN Summit of the Future on September 23 (Ch, En) and UN General Assembly (UNGA) meeting on September 28 (Ch, En), referencing AI in both speeches. At the Summit of the Future, Wang announced China’s AI Capacity-Building Action Plan for Good and for All (Ch, En), which was formally published on September 27. The plan focuses on increasing Chinese cooperation with the Global South on AI development, including cooperation on AI R&D and application, exchange programs, promoting AI literacy, and developing AI data resources. The plan also articulated support for establishing “global, interoperable AI risk assessment frameworks, standards and governance” that address interests of developing countries and pursuing joint AI risk assessments.
Top Chinese and US leaders discuss next round of intergovernmental AI talks
On August 27-29, US National Security Advisor Jake Sullivan visited China and met with Chinese President XI Jinping as well as Politburo member and Foreign Minister WANG Yi. According to the US readout, Sullivan discussed implementation of commitments on AI safety and risk in the meeting with President Xi. According to the Chinese readout (Ch, En), Wang and Sullivan agreed to hold the second round of intergovernmental talks on AI “at an appropriate time,” and Sullivan also referenced working towards “another round of AI safety and risk talks” at a press conference.
China-BRICS AI cooperation center launched
On July 19, the China-BRICS AI Development and Cooperation Center was launched in Beijing, with attendance by Chinese Vice Minister of the Ministry of Industry and Information Technology (MIIT) SHAN Zhongde (单忠德), China’s BRICS envoy, and ambassadors from several BRICS countries. Vice Minister Shan suggested that the center cooperate on technical research and innovation, creating an industry cooperation ecosystem, and developing AI standards, governance experiences, as well as a “BRICS plan” for AI governance. The Center’s work will be carried out by the MIIT-overseen China Academy of Information and Communications Technology (CAICT). On August 6, the Center issued a call for companies, universities, and research institutions to submit BRICS AI cooperation cases in the fields of technical cooperation, industry projects, applications, and governance.
China announces plans to deepen digital cooperation with African states
The Forum on China-Africa Digital Cooperation was held in Beijing on July 29, with participation of MIIT Vice Minister JIN Zhuanglong (金壮龙) and representatives from more than 40 African countries. The attendees jointly published the China-Africa Digital Cooperation Development Action Plan during the forum. The plan calls for exploring cooperation in fields including AI and calls for creating an AI cooperation center.
China to expand dialogue with Russia on AI
On August 21, the Chinese and Russian premiers met in Moscow and published a joint communique. The communique stated that China and Russia would establish an AI working group under the regular premier-level meeting committee to pursue cooperation and sharing of best practices on AI ethics, governance, industry, and applications. The document also noted support for expanded BRICS cooperation on AI through the China-BRICS AI Development and Cooperation Center.
Domestic AI Governance
Standards body issues AI safety governance framework including frontier risks
On September 9, during the Cyberspace Administration of China’s (CAC) Cybersecurity Publicity Week Forum, standards committee TC260 published an AI Safety Governance Framework (Ch, En) as part of the implementation of China’s Global AI Governance Initiative.1 The framework provides an extensive classification of AI safety risks, discussion of measures to address risks, and several safety guidelines. It categorizes risks into three “inherent safety risks” — risks from models and algorithms, risks from data, and risks from AI systems — as well as four “safety risks in AI applications” — cyberspace risks, real-world risks, cognitive risks, and ethical risks. Most relevant to frontier AI safety, the document lists risks such as misuse in dual-use domains of biological weapons and cyberattacks, as well as the possibility of AI becoming uncontrollable in the future. The technical measures it recommends to address AI risks largely lack provisions targeted at these frontier risks. Nevertheless, the document calls for advancing research on AI explainability, tracking and sharing information about AI safety incidents, and promoting international cooperation to develop standards on AI safety.
CAC issues draft document on watermarking AI-generated content
On September 14, the CAC published a draft plan on generative AI content watermarking for public comment. The draft seeks to implement regulations requiring both visible and invisible watermarks for AI-generated text, image, audio, video, and other content. Compared to a 2023 watermarking standard, it adds requirements for online content distribution platforms to take measures to regulate dissemination of such content, including adding warning labels to synthetic content.
Head of China’s internet regulator discusses AI supervision policies
On August 12, CAC director ZHUANG Rongwen (庄荣文) gave an interview to Xinhua News discussing the implications of China’s Third Plenum decision on cyberspace governance. On AI, Zhuang stated that China will seek to:
strengthen development of autonomous large models and computing capabilities;
guide application development in various sectors;
maintain safety/security bottom lines by developing standards for safety testing and emergency response, and prevent risks of misuse such as leaking personal information, generating fake information, and violating IP rights.
Beijing and Shanghai local governments create AI safety and governance research institutions
Multiple bureaus within the Shanghai and Beijing local governments established respective AI safety and governance organizations on July 8 and September 3, the Shanghai AI Safety and Governance Laboratory and the Beijing Institute for AI Safety and Governance. The Shanghai lab is created in a partnership between Shanghai AI Lab’s Governance Research Center and the Shanghai Information Security Testing Evaluation and Certification Center. The Beijing institute is led by Chinese Academy of Sciences professor and AI ethics and safety expert ZENG Yi (曾毅), and the deputy director is WEI Kai (魏凯), director of the MIIT-backed CAICT AI Research Institute.
Beijing local government releases action plan with AI safety provision
On July 26, the city of Beijing released an action plan for pursuing the “AI+” initiative over 2024 and 2025. The plan focuses primarily on enhancing Beijing’s position in various AI application fields, such as robotics, education, and healthcare. It includes one paragraph on AI safety and security measures, including calling for developing safety testing standards, conducting AI ethical alignment testing, and developing a risk assessment warning system.
State-backed think tank publishes report on AI safety research
On September 23, Alibaba Cloud and the CAICT Safety and Security Research Department published the Foundation Model Safety Research Report 2024. This report has extensive treatment of many different aspects of AI safety and risks, approaching the issue from four levels: training data, algorithm and model, system and platform, and services and application. It lists generating content that violates public safety or ethics, as well as misuse by users, as major application-level risks. It also lists controllability as a key aspect of model safety. The report recommends a number of safety measures, including creating safety datasets to align models to human values, testing models for robustness, and improving interpretability of models. Looking forward, the paper voices concerns about autonomous robots and LLM applications in the financial, chemical, and nuclear industries. Internationally, it recommends developing governance frameworks with global consensus and regional guides for AI governance.
Technical Safety Developments
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks
October 2: This preprint was supervised by Zhejiang University Assistant Professor ZHANG Qiang (张强), and it involved collaboration among 10 different institutions in China, Australia, the UK, US, and Singapore. The paper creates a benchmark to evaluate safety of LLMs across four different scientific languages: textual, molecular, protein, and genomic. The benchmark contains 31,840 questions across chemistry, biology, medicine, and physics, including a jailbreak enhancement feature to test models with safety guardrails. Risks encompassed by the benchmark include toxic chemical compounds, viruses, addictive drugs, and knowledge of nuclear physics.
Safety Layers in Aligned Large Language Models: The Key to LLM Security
August 30: This preprint was authored by researchers from Alibaba and the University of Science and Technology of China (USTC), anchored by Alibaba DAMO Academy researcher LI Yaliang and USTC Professor ZHANG Lan (张兰). The paper argues that a small set of layers in the middle of the model at the parameter level are crucial to AI safety. It also develops methods to fine-tune those layers to improve AI safety.
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
June 17: This preprint was co-authored by Renmin University Gaoling School of AI Executive Dean WEN Jirong (文继荣) and other researchers from Renmin University and WeChat AI. It explores whether strong models can deceive weak models by showing aligned behavior in areas known to the weak model but unaligned behavior in areas not known to the weak model. It finds that such weak-to-strong deception exists when exploring a multi-objective alignment case where some alignment targets conflict with each other.
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
June 11: This preprint is anchored by Tsinghua University Professor and AI security startup RealAI co-founder ZHU Jun (朱军), with additional participants from Beihang University and Shanghai Jiao Tong University. The authors developed a benchmark named MultiTrust to evaluate LLM trustworthiness across five dimensions of truthfulness, safety, robustness, fairness, and privacy. The researchers additionally released a scalable toolbox for trustworthiness research.
Other relevant technical publications
Huazhong University of Science and Technology, Beihang University, and Griffith University, BadRobot: Manipulating Embodied LLMs in the Physical World, preprint, July 16, 2024.
Shanghai AI Lab, Macau University of Science and Technology, Eversec Technology, and Chinese Academy of Sciences, The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure, preprint, July 22, 2024.
Singapore Management University, Fudan University, University of Melbourne, BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models, preprint, August 23, 2024.
Jilin University et al., XTRUST: On the Multilingual Trustworthiness of Large Language Models, preprint, September 24, 2024.
Chinese Academy of Sciences, Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction, preprint, September 25, 2024.
Chinese University of Hong Kong et al., A Survey on the Honesty of Large Language Models, preprint, September 27, 2024.
Chinese Academy of Sciences, Beijing Institute of AI Safety and Governance, and Center for Long-term Artificial Intelligence, Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models, October 3, 2024.
Expert views on AI Risks
AI expert dialogue signs consensus statement calling for three new processes to combat catastrophic risks
The International Dialogues on AI Safety (IDAIS) held its third meeting in Venice from September 5 to 8, convened again by leading AI scientists Yoshua Bengio, Andrew Yao (姚期智), Geoffrey Hinton, ZHANG Ya-Qin (张亚勤), and Stuart Russell. Additional participants and signatories included representatives from the Chinese Academy of Sciences, Beijing Academy of AI, Shanghai AI Lab, Singapore AI Safety Institute, Anthropic, Chinese startup Zhipu AI, and The Elders. Concordia AI covered the previous two iterations of the dialogue in Issue #4 and #13. The consensus statement called for states to develop domestic authorities to detect and respond to catastrophic AI risks while also developing an international governance regime. It suggested setting up three processes:
emergency preparedness agreements and institutions for coordination among domestic AI safety authorities;
safety assurance frameworks where frontier AI developers demonstrate to domestic authorities that they are not crossing red lines through evaluations, safety cases, and other means;
global AI safety and verification research funds supported by states, philanthropists, corporations, and experts to grow independent research capacity on key technical problems.
International group of AI experts call for an inclusive approach to AI policymaking
On September 26, a group of 21 AI researchers and policy experts, led by Turing Award Winner Yoshua Bengio and former White House Office of Science and Technology Policy acting director Alondra Nelson, signed the Manhattan Declaration on Inclusive Global Scientific Understanding of Artificial Intelligence. There were three Chinese signatories: Chinese Academy of Sciences Professor ZENG Yi, Alibaba Cloud founder WANG Jian, and Concordia AI CEO Brian Tse. The declaration focused on ten key suggestions, including global scientific cooperation, mitigating ongoing AI harms and anticipated risks, promoting interdisciplinary collaboration, and evidence-based governance frameworks.
Chinese academician analyzes AI safety research landscape
Academician and former head of the National Administration of State Secrets Protection WU Shizhong (吴世忠) gave a talk on AI safety and security at the 12th Internet Security Conference on July 31. Wu noted based on an analysis of arXiv papers that AI safety research lags behind significantly compared to technological innovation research, but safety research has become increasingly popular since the development of ChatGPT. Therefore, Wu proposed that AI safety research should study the safety of AI models, security against attacks, and incorporating ethics. He also called for further work on interpretability, diversifying safety technologies, ethical issues, and safety testing.
State media reports on risks of AI agents
On July 17, Xinhua News, China’s premier state media outlet, published an article on the risks of AI agents. The article discussed the risk of AI agents deceiving humans, subverting safety measures, and escaping human control. The article references previous work on this topic by Harvard Professor Jonathan Zittrain and the Managing extreme AI risks amid rapid progress paper in Science co-authored by Turing Award Winners Yoshua Bengio, Geoffrey Hinton, Andrew Yao, and other notable scholars.
What else we’re reading
Karman Lucero, Managing the Sino-American AI Race, Project Syndicate, August 9, 2024.
Yanzi Xu and Daniel Castro, How Experts in China and the United Kingdom View AI Risks and Collaboration, Center for Data Innovation, August 12, 2024.
Nick Corvino and Boshen Li, Survey: How Do Elite Chinese Students Feel About the Risks of AI?, August 23, 2024.
Center for International Security and Strategy at Tsinghua University and Brookings Institution, Interim Findings on Artificial Intelligence Terms / Glossary of artificial intelligence terms, August 2024.
Jeffrey Ding, ChinAI #280: Sour or Sweet Grapes? The U.S.'s Unstrategic Approach to the "Chip War", ChinAI Newsletter, September 9, 2024.
Patrick Zhang, China Unveils AI Safety Governance Framework to Lead Global Standards, Geopolitechs, September 9, 2024.
Catherine Thorbecke, Where the US and China Can Find Common Ground on AI, Bloomberg, September 20, 2024.
Scott Singer, How the UK Should Engage China at AI’s Frontier, Carnegie Endowment for International Peace, October 18, 2024.
Concordia AI’s Recent Work
Concordia AI CEO Brian Tse and Senior Program Manager Kwan Yee Ng attended the UN Summit of the Future Action Days in September and side events hosted by the UN Office of the Secretary-General’s Envoy on Technology, OpenAI, the Partnership on AI, The Elders, and other organisations.
Concordia AI CEO Brian Tse was a speaker on a panel at the the Pacific Economic Cooperation Council’s seminar on the Responsible Adoption of General Purpose AI, an official partner event of the France AI Action Summit.
Concordia AI Senior Program Manager Kwan Yee Ng contributed to the Carnegie Endowment for International Peace’s paper The Future of International Scientific Assessments of AI’s Risks.
Concordia AI Senior Governance Lead FANG Liang participated in a seminar held by the AI Industry Alliance of China’s Safety and Security Governance Committee on AI risk management.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
TC260’s Chinese name is 全国网络安全标准化技术委员会. We have previously covered TC260’s work in the State of AI Safety in China report pages 13-15 and State of AI Safety in China Spring 2024 Report page 67.