AI Safety in China #2
Researching moral reasoning of large models, BRICS AI study group, former President of Baidu on signing the Statement on AI Risk, and standards implementation guide for generative AI watermarking
Key Takeaways
A research group at the Chinese University of Hong Kong released a study on the ability of large models to reason morally.
Microsoft Research Asia and ByteDance released separate survey papers regarding AI safety and alignment.
President Xi announced the creation of a BRICS study group on AI.
Former President of Baidu ZHANG Ya-Qin was interviewed on his reasons for signing the Statement on AI Risk.
A Chinese standards body published a guide on watermarking generative AI.
International AI Governance
Xi Jinping announces BRICS AI Study Group
Background: Chinese President Xi Jinping delivered a speech at the 15th BRICS Summit in Johannesburg, South Africa on August 23 (En, Ch). In a section on expanding BRICS’ “political and security cooperation to uphold peace and tranquility,” President Xi announced that BRICS will create an AI Study Group in the BRICS Institute of Future Networks “at an early date.” The BRICS Institute of Future Networks is a think tank researching policies on 5G, AI, industrial internet, and other technologies, according to the head of a Chinese state-affiliated think tank building the institute’s China Branch.1
The study group: President Xi stated that the goal of the study group is to “fend off risks, and develop AI governance frameworks and standards with broad-based consensus, so as to make AI technologies more secure, reliable, controllable and equitable.” This is in line with language from last year’s BRICS Summit’s Beijing Declaration, which noted the important role of AI in sustainable development and called for BRICS members to share best practices on ethical and responsible AI (En, Ch). The declaration also expressed concerns over AI risks including privacy, bias, and “effects [sic] and singularity,” among others. It is unclear whether the term “singularity” here is referring to the idea of a “technological singularity.”2 There is little information thus far on which exact issues the study group will focus on, and little indication it will focus on frontier risks.
Implications: China’s positions on international AI governance often attempt to support the interests of the Global South, and China often prefers to work through institutions that are not dominated by the West. For instance, China’s Ambassador to the UN called in July for “equal access and utilization of AI technology products and services” to bridge “divides between the North and the South” (En, Ch). Xi Jinping’s role in announcing this new BRICS project suggests that China is taking the initiative in building out new, non-Western international AI governance institutions.
Domestic AI Governance
Chinese standards body publishes guide on generative AI
Background: China’s National Information Security Standardization Technical Committee (TC260), which publishes voluntary domestic standards under the Standardization Administration of China, has released a standards implementation guide on watermarking generative AI and additional plans for drafting future generative AI standards.3 As is usual for the work of the committee, many large internet companies and start-ups (including Baidu, Huawei Cloud, and Alibaba Cloud) provided technical support to the drafting process.
Watermarking standard: The guide is intended to help implement both the visible and invisible watermarking requirements in Article 12 of the regulations on generative AI. It specifies techniques, such as spatial domain or transform domain watermarking, that providers should use in invisible, digital watermarks, without mandating any particular technique. It additionally states what information should be contained in watermarks that can later be extracted using a service created by the AI service provider. While the guide claims to address watermarking of text, it only suggests techniques for AI-generated images, videos, and sound.
Future plans: TC260 announced on August 30 that it had established projects to write standards on AI issues including: security norms for deep synthesis internet information services, security norms for data in generative AI training and fine-tuning, security norms for human labeling of generative AI, and realizing digital watermarking technology.
Implications: Digital watermarking is an area of interest internationally, with leading AI companies such as OpenAI, Google, and Meta making voluntary commitments in July to develop watermarking tools, and Google DeepMind recently announcing a beta version of its tool. As tools and standards are developed in the US and UK, there could be opportunities for mutual learning with Chinese standards-setting bodies, though it is unclear how technically advanced this guide is.
Technical Safety Developments
CUHK researchers examine whether LLMs can perform moral reasoning
Background: A research team at the Chinese University of Hong Kong (CUHK) led by Dr. Helen Meng (蒙美玲) published a preprint on August 29 studying whether LLMs can perform moral reasoning through the lens of moral theories. Dr. Meng has previously participated in discussions with other influential Chinese scholars on issues such as large model ethics and governance.
The research: The researchers claim to offer three main contributions. First, they develop a top-down approach for moral reasoning, based on a set of principles used to prompt the model, which they distinguish against bottom-up approaches that train models on a large set of annotated data based on crowd-sourced opinions about morality. They then test the alignment of GPT-4 when prompted using several different moral principles (justice, deontology, and utilitarianism) against datasets for specific moral theories as well as commonsense morality. Lastly, they examine shortcomings in both their datasets and the models, claiming to demonstrate how moral judgements can differ based on cultural background. Ultimately, the researchers conclude that current state-of-the-art LLMs already have “good abilities to make moral judgements w.r.t [with regard to] moral theories,” and “a theory-guided top-down approach can increase explainability and enable flexible moral values.”
Microsoft Research Asia team surveys goals of AI alignment
Background: A team of researchers from Microsoft Research Asia (MSRA) released a preprint on August 23 surveying the alignment goals of previous relevant research. The team is led by Dr. XIE Xing (谢幸) of an MSRA team working on “Societal AI” (社会责任人工智能).4 This team seeks to build socially responsible AI through ensuring that AI values are aligned with human values, addressing privacy and copyright issues, ensuring AI outputs are correct, developing new methods of model evaluation, and pursuing interdisciplinary cooperation with the social sciences.
The research: The paper defines three increasingly difficult levels of alignment goals: aligning AI to human instructions, human preferences, and human values. They claim the first relates to the fundamental ability of large models whereas the latter two relate to value orientation. The authors review benchmarks and evaluations for each of these three goals. They appear to favor aligning large models to intrinsic human values because preference alignment has weak comprehensiveness, generalization, and stability. Therefore, they discuss challenges and future directions for alignment to intrinsic values, such as defining the value system to which AI should be aligned and suggesting more comprehensive evaluations of alignment.
ByteDance researchers release survey on LLM alignment
Background: A research team at ByteDance released a preprint surveying whether the trustworthiness of LLMs was positively correlated with their level of alignment. The team is led by LI Hang (李航), the current Head of Research at ByteDance and previously Director of the ByteDance AI Lab and Director of Huawei’s Noah’s Ark Lab.
The research: The paper defines alignment as “making models behave in accordance with human intentions,” citing papers by OpenAI and DeepMind. It carves up LLM trustworthiness into seven major categories based on a review of existing literature: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. They further break these down into subcategories, including many relevant to frontier model risks: hallucination, sycophancy, various types of misuse, robustness to prompt attacks, and robustness to poisoning attacks. The researchers then conducted measurement studies on several LLMs to assess trustworthiness in 8 of the subcategories they identified. They found that models that claim to be more aligned do perform better in terms of overall trustworthiness, while varying across the different categories.
Other relevant publications
Huawei Noah’s Ark Lab, Aligning Large Language Models with Human: A Survey, arXiv preprint, July 24, 2023.
Expert Views on AI Risks
Former President of Baidu explains reasons for signing Statement on AI Risk
Background: ZHANG Ya-Qin (张亚勤) gave an interview to Internet Weekly in August 2023 discussing AI risks, translated on Concordia AI’s Chinese Perspectives on AI website. Dr. Zhang is a highly influential Chinese AI scientist, previously President of Baidu, Corporate Senior Vice President of Microsoft, and Chairman of Microsoft Research Asia. He currently is the Director of Tsinghua University’s Institute for AI Industry Research (AIR). Dr. Zhang signed the Center for AI Safety’s (CAIS) Statement on AI Risks.
Zhang’s remarks: Dr. Zhang spoke at length about risks of AI and the problem of aligning AI with human values. He explained that he signed the CAIS open letter because “uncontrolled AI research could lead to disastrous risks” and called for governments, companies, and others in society to “be vigilant at all times and strengthen supervision [of AI research], just like for nuclear weapons and COVID.” He suggested that to solve alignment, developers must build their research upon a “foundation of alignment with those [i.e., human] values” and should not be “just pursuing capabilities without addressing alignment.”
Implications: As a leading researcher and influential voice in the Chinese AI research community, it is possible that Dr. Zhang’s remarks to Chinese media on this topic could lead to greater awareness of and concern for AI frontier risks among Chinese researchers.
Other relevant articles
ZHANG Qinkun (张钦坤), Secretary General of Tencent Research Institute, and Jeff Cao (曹建峰), senior researcher at Tencent Research Institute, Value Alignment of Large AI Models: What Is It, Why, and How?, August 23, 2023.
Concordia AI’s Recent Work
Concordia AI participated in discussions at the World Artificial Intelligence Conference (WAIC) in Shanghai in July, 2023.
Concordia AI released a Chinese-language AI Alignment Failure Database jointly with leading Chinese AI media publication Synced Review (机器之心). The database can be found here and currently collates 79 cases of alignment failures, adding on to collections developed by researchers at DeepMind.
Concordia AI was a co-organizer of the 2023 IJCAI-WAIC Large Model and Technological Singularity: Humanities and Science Face-to-Face Summit on July 7. Speakers included Max Tegmark from MIT, Toby Walsh from the University of New South Wales, and CHENG Sumei (成素梅) from the Shanghai Academy of Social Sciences.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.
The remark was made by Liu Duo (刘多), president of the China Academy for Information and Communications Technology (中国信息通信研究院), which is housed under the Ministry for Industry and Information Technology (工业和信息化部).
The 2022 Beijing Declaration curiously states: “We express our concerns on the risk, and ethical dilemma related to Artificial Intelligence, such as privacy, manipulation, bias, human-robot interaction, employment, effects and singularity among others,” where “effects” does not make grammatical sense in the sentence. The 2021 BRICS Communications Ministers’ Meeting declaration contains the same sentence except saying “effects of autonomy” rather than just “effects.” We therefore assess that “effects of autonomy” was the intended meaning and was most likely accidentally miswritten in the 2022 Beijing Declaration.
The full Chinese name for TC260 is 全国信息安全标准化技术委员会.
The English term “Societal AI” is directly provided by the MSRA article, even though the direct translation is “Socially Responsible AI.”