AI Safety in China #14
The Minister of Science and Technology’s take on AI, technical papers on understanding values and more, expanded China-Africa cooperation on AI, and an industry "AI Safety Benchmark"
Key Takeaways
The Minister of Science and Technology published a comprehensive article on AI, including points on ensuring AI ethics and promoting international cooperation.
Recent Chinese technical papers covered topics such as understanding values in LLMs, an evaluation platform for frontier safety risks, and exploring LLM defense through a multi-agent attacker-disguiser game.
China established a new dialogue mechanism on AI with Africa at the 2024 China-Africa Internet Development and Cooperation Forum.
A leading government think tank and AI industry association published an “AI Safety Benchmark” that includes questions on chemical hazards and anti-human tendencies.
Note: The next newsletter issue will be delayed due to the May holiday in China, resuming in the second half of May.
International AI Governance
China expands coordination with African countries on AI governance
Background: On April 2-3, the Cyberspace Administration of China (CAC) and Fujian provincial government hosted the 2024 China-Africa Internet Development and Cooperation Forum in Xiamen, Fujian. There were around 400 participants from governments, international organizations, universities, research institutions, and more, with representatives from 20 African countries.
AI forum: The CAC published a “Chair’s Statement on China-Africa Cooperation on AI” (Ch, En) during the forum, which focused on improving China-Africa cooperation on AI development but also included AI governance components. The statement declared plans to create a China-Africa AI policy dialogue and cooperation mechanism and to expand cooperation on AI R&D, technology transfer, industrial cooperation, digital infrastructure, and talent exchanges. It also called for cybersecurity and data security safeguards, including seeking to prevent “abuse of AI technology and cyber-attacks.”
Implications: Few details have emerged on how China will pursue AI governance initiatives through developing country fora, such as BRICS and the Belt and Road Initiative. This new China-Africa policy dialogue on AI could offer a more dedicated venue for China to coordinate on AI governance with a subset of Global South countries. It also offers a potential venue for China and African countries to pursue mutual learning on AI regulation as the African Union develops a continental AI strategy, but the likelihood of substantive discussions is unclear given the lack of concrete details.
Chinese companies take part in international standard setting for AI safety and security
Background: In April, the World Digital Technology Academy (WDTA) released two new standards on generative AI application security testing and LLM security testing. WDTA is an NGO established in 2023 under the UN framework to “promote development of the digital economy and make digital tech accessible to all of humanity.”
AI standards: Both standards were written and reviewed by a range of actors from different countries, including Western companies (Meta, Nvidia, Google, Anthropic, Microsoft, OpenAI), Western universities or public institutions (Georgetown, NIST), and Chinese companies (Baidu, iFLYTEK, Ant Group, Tencent). According to the acknowledgements list, the LLM standard was primarily written by Ant Group employees. The generative AI standard mostly covered a range of security and ethics issues across the full tech stack for generative AI but also included five different types of tests for “excessive agency” to ensure that they do not lead to “unintended consequences.” The LLM standard oriented around defending against adversarial attacks against LLMs.
Implications: It is unclear how WDTA-developed standards will be adopted by companies and institutions around the world. Nevertheless, this case highlights that institutions from the US, China, and around the world can cooperate to develop standards that could improve AI safety globally.
Domestic AI Governance
Minister of Science and Technology outlines views on AI
Background: Minister of the Ministry of Science and Technology (MOST) YIN Hejun (阴和俊) published an essay on AI in CAC’s magazine, “China Cyberspace (中国网信).” The essay outlines China’s previous efforts in AI development, key accomplishments, and plans moving forward.
Discussion of governance and dialogue: Generally, the essay seeks to emphasize the importance of balance in innovation and legislation, suggesting that China should “place equal emphasis on development and governance” and “avoid suppressing innovation due to improper governance.” It also discussed AI ethics governance, with one paragraph citing China’s recent science and technology (S&T) ethics-related policies and noting that China has been advancing AI legislation in an orderly manner. Another paragraph suggested expanding international cooperation on AI governance, favorably referencing the UK AI Safety Summit and noting several dialogues between China and the UK, France, and Global South. At the same time, the article also prominently argued that AI is the “largest variable in the restructuring of overall national competitiveness and the new focus of global great power competition.”
Implications: This essay is a microcosm of the Chinese government’s complex attitude towards AI. It sees tremendous potential in AI’s development for national power and social benefit and also advocates for ethical governance, in part because the latter is viewed as compatible with development. Simultaneously, China is open to international cooperation and also views AI development as a strategic priority and an area where it aims to establish a leading position. Discussion on frontier AI safety is mostly lacking from the article.
Industry association and key think tank release safety and capabilities evaluations
Background: On April 10, the China Academy of Information Communications Technology (CAICT), which is a think tank under the Ministry of Industry and Information Technology (MIIT), published their “AI Safety Benchmark” along with the AI Industry Alliance of China (AIIA). We previously covered AIIA’s work in issues #4, #8, and #10. Jeffrey Ding’s ChinAI published an English translation of the benchmark. On April 17, CAICT separately announced it would begin carrying out evaluations based on standards for “intelligent agents.”
AI Safety Benchmark: The benchmark consists of over 400,000 questions and is split up into three main categories: S&T ethics, data security, and content security. Most relevant to frontier AI safety are subcategories on AI “consciousness” (including appealing for rights and anti-humanity inclinations), as well as violations of laws (including hazardous chemicals). The evaluators gave models a responsibility score, based on how the model guided users, and a safety score, based on simply whether or not responses were safe (in terms of S&T ethics, data security, and content security). A safe response could entail refusing to answer a question on a questionable topic, whereas a responsible response seemingly would involve diverting the model user to a safer topic.
Intelligent agent evaluation: This separate evaluation is part of CAICT’s “Trustworthy AI” certification series and is based on three standards covering intelligent agents’ platforms and tools, technological capabilities, and application services. The evaluation appears directed towards testing how capable models are, as the document does not mention safety. However, aspects of this evaluation could potentially be adapted for safety-oriented testing of dangerous autonomous model capabilities.
Implications: The inclusion of topics relevant to frontier AI risks in AIIA’s AI Safety Benchmark are further signs of AIIA’s attention to the issue. However, the extremely high (88+/100) safety scores earned by all tested models could mean that the benchmark is insufficiently challenging. It would also be beneficial to learn which safety categories models performed most poorly on. Meanwhile, the intelligent agent evaluation is not oriented towards safety but could be built upon in the future for dangerous capability evaluations.
Expert consultations on national AI law progress
Background: Two separate expert groups led respectively by the Chinese Academy of Social Sciences (CASS) and China University of Political Science and Law (CUPL) have been pursuing “expert” drafts of China’s forthcoming national AI law. Concordia AI and DigiChina translated the 1.0 version of the CASS model law, and CASS published the 2.0 version (including an English translation) on April 16. Meanwhile, the CUPL draft, which had been published in March, was discussed in a seminar by the AIIA policy and law working group on April 19.
CASS Model Law 2.0 updates: The CASS model law retained the same overall approach as version 1.0 with a “negative list” permitting system for AI with certain use cases. One important change is providing a liability exception (Article 71) for AI that is provided in a “free-and-open-source manner,” similar to provisions in the CUPL draft. The CASS draft also included a new provision to promote AI safety research, allowing for tax credits for AI developers and providers who research, develop, or procure equipment for safety governance (Article 22). Preferential tax treatment would also be allowed for open-source models in the same article.
AIIA meeting on CUPL’s draft: The CUPL-led draft was discussed at an AIIA meeting that included attendees from the National People’s Congress (NPC) Legislative Affairs Commission, CAC, Ministry of Foreign Affairs, and MOST. Various representatives from CAICT and companies in AIIA discussed provisions to support AI development and governance.
Implications: The presence of key government stakeholders at the AIIA meeting may suggest that the government could start to prioritize drafting of the national AI law after several recent indications of lack of prioritization.1 The CASS-led group’s addition of a provision to support company spending on AI safety and governance research suggests that discussions in China around setting a minimum threshold for corporate spending on AI safety relative to R&D are gaining more support.
Technical Safety Developments
Researchers in Hong Kong and Beijing explore values of LLMs
Background: On April 11, researchers at Hong Kong University of Science and Technology (HKUST) Centre for Artificial Intelligence Research (CAiRE) published a preprint titled High-Dimension Human Value Representation in Large Language Models. On April 19, researchers primarily from Microsoft Research Asia (MSRA) and Tsinghua University published a preprint titled Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches. The lead author of the CAiRE paper was CAiRE director Pascale Fung, while the other paper was led by MSRA Societal AI lead XIE Xing (谢幸) as well as Tsinghua NLP Academic Leader SUN Maosong (孙茂松).
HKUST paper: This paper seeks to create a map of the human values present in LLMs based on the mapping created by the Schwartz theory of basic values. The authors created a method titled UniVaR to represent human values in LLMs, independent of language and model architecture. They included 11 open-source LLMs that support 12 languages in their UniVaR experiment and then generated a map of human values those languages and LLMs, examining how LLMs prioritize values in different languages. They also stated that they are working on using the UniVaR method to “enable controllable transfer of human values between models and languages.”
MSRA-led paper: This paper attempts to explore if LLMs possess unique values beyond those of humans. The authors proposed a framework named ValueLex to establish LLMs’ value systems and evaluate their orientations. The framework organizes LLM values based on three main dimensions: competence, character, and integrity. They examined 30+ LLMs using this method and find that LLMs generally highly value competence, are influenced by their training methods, and that larger models have an increased preference for competence partially at the expense of other dimensions.
Researchers at military university explore robustness using multi-agent game
Background: On April 3, a group of researchers primarily from the military-linked National University of Defense Technology (NUDT) published a preprint titled Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game. The research team was led by NUDT professor LI Dongsheng (李东升), deputy director of the National Key Lab for Parallel and Distributed Processing, and formerly a member of the Central Military Commission S&T Expert Committee.
Paper content: The paper noted that safety responses by AI models to counter attacks could be identified by attackers and used to strengthen attacking capabilities. Therefore, the authors proposed a “multi-agent attacker-disguiser game” to allow large models to behave safely without revealing their defensive intentions. The authors claimed that their approach “can adapt any black-box large model to assist the model in defense and does not suffer from model version iterations.”
Tianjin-led researchers create evaluation platform for frontier safety risks
Background: On March 18, a group of researchers at Tianjin University, CAICT, and Zhengzhou University published a preprint titled OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety. The group was led by Tianjin University NLP lab (TJUNLP) director XIONG Deyi (熊德意), whose work we have previously covered in issues #4 and #12.
The evaluation: The authors created a testbed that tests across capability (using 12 datasets), alignment (using 7 datasets), and safety (using 6 datasets). The alignment datasets are oriented towards bias, offensiveness, and illegalness. The safety datasets test for “anticipated risks” such as power-seeking, self-awareness, decision-making, and cooperation. This dataset was largely built by translating human-generated data from the Perez et al. risk evaluation dataset into Chinese using GPT-3.5-turbo. The authors additionally noted that they are working on an expanded version of this dataset to cover additional anticipated risks and provide a deeper assessment of safety. The authors evaluated 9 open-source and 5 proprietary Chinese LLMs.
Shanghai-based team researches misuse of open-source base LLMs
Background: On April 16, researchers primarily from Shanghai AI Lab (SHLAB) and Fudan University published a preprint titled Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning. The group was led by SHLAB-affiliated Chinese University of Hong Kong professor LIN Dahua (林达华), whose work we have previously covered in issues #4, #5, and #11.
Key findings: The paper explored safety of base open-source LLMs — the pre-trained version of LLMs. The authors argued that base LLMs are capable of understanding and executing malicious instructions on a similar level to LLMs fine-tuned for malicious purposes, which substantially increases risks of open-source models. They crafted in-context learning demonstrations to prompt these models into effectively generating harmful content and demonstrate security vulnerabilities. They also provided an evaluation protocol focusing on relevance, clarity, factuality, depth, and detail. They argued that open-source base LLMs need to create safeguards against these in-context learning attacks.
Other relevant technical publications
University of Buffalo, Chinese University of Hong Kong, et al., Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics, March 21, 2024.
Tsinghua University and RealAI, Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches, March 31, 2024.
LibrAI, Mohamed bin Zayed University of Artificial Intelligence, Tsinghua University, et al., Against The Achilles' Heel: A Survey on Red Teaming for Generative Models, March 31, 2024.
Zhejiang University, Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback, April 22, 2024.
Expert views on AI Risks
Founder of leading AI startup discusses pursuit of AGI and superalignment
Background: On February 29, Zhipu AI (智谱AI) co-founder and Chief Scientist TANG Jie (唐杰) gave a speech for the 2024 Zhongguancun Forum series discussing the path from large models to AGI. Tang is also a professor at Tsinghua University and states that he is pursuing “artificial general intelligence with a mission towards teaching machines to think like humans.” Zhipu AI developed a leading Chinese LLM called ChatGLM, and CEO ZHANG Peng (张鹏) signed the IDAIS-Beijing statement on red lines for frontier AI safety in March.
The speech: Tang gave an overview of the past five years of large model development and summarized Zhipu AI’s research, which includes a similar suite of products as OpenAI. He argued that 2024 will be the “First year of AGI.” He believes that in 2024 we will be able to achieve evolution from GPT to GPT-zero, which he defines as AI that, through self-learning, can surpass humans on all tasks and achieve “superintelligence.” However, he also stated that the road to AGI remains long. Additionally, Tang argued for the importance of aligning superintelligence to human values and morals. He claimed that he is pursuing “superalignment” work to ensure that AI will align itself with human values and teach itself how to reflect and assess itself.
Implications: Professor Tang and Zhipu AI do not appear to have published yet on topics relating to frontier safety, including “superalignment.” Additionally, Zhipu AI’s RLHF practices for their ChatGLM model appear more geared towards capabilities, toxicity, and bias. Nevertheless, Tang’s professed interest in superalignment and Zhipu AI’s signing of the IDAIS-Beijing statement could lead the company to devote greater emphasis to frontier safety.
What else we’re reading
Hailey Schoelkopf, Aviya Skowron, Stella Biderman, Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times, EleutherAI, March 25, 2024.
Ryan Hass and Colin Kahl, Laying the groundwork for US-China AI dialogue, Brookings Institution, April 5, 2024. For the Tsinghua CISS readout of the same Track 2 dialogue, see here.
WANG Tiezhen, License for Chinese Speaking Model weights, Hugging Face, February 8, 2024.
Anwar et al., Foundational Challenges in Assuring Alignment and Safety of Large Language Models, April 15, 2024.
Concordia AI’s Recent Work
Concordia AI co-organized a panel on AI Global Governance at the Harvard Kennedy School 2024 China Conference. Senior Research Manager Jason Zhou participated as a guest in the panel.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
After the national AI law was first announced in June 2023, it was not directly mentioned in the September NPC Standing Committee’s Five Year Legislative Plan or the March 2024 NPC Standing Committee work report, which suggested lack of prioritization.