Concordia AI hosting AI safety sessions at the International AI Cooperation and Governance Forum 2024
Hosted at the National University of Singapore, in collaboration with Tsinghua University and HKUST
On December 2-3, 2024, Concordia AI and the AI Verify Foundation co-hosted three panels on AI safety at the International AI Cooperation and Governance Forum 2024 at the National University of Singapore (NUS), in collaboration with Tsinghua University and Hong Kong University of Science and Technology. The forum’s safety sessions focused on three key topics: big picture priorities and international cooperation on AI safety, the science of AI safety evaluations, and the relationship between safety at the foundation model and downstream application levels. Concordia AI Senior Research Manager Jason Zhou hosted the safety proceedings.
Presentations on international AI safety cooperation
Executive Director of the Digital Trust Centre Singapore and the Singapore AI Safety Institute (AISI) LAM Kwok Yan outlined a “TrustTech” approach to developing AI technology. Lam described Singapore’s efforts to develop AI trust technologies to enable secure collaboration across organizations, strengthen robustness, and bridge academic research with real-world deployment. He emphasized addressing AI systems as socio-technical systems and mitigating vulnerabilities in foundation models that could lead to collective societal failures. Singapore AISI's work spans four core areas: testing and evaluation, safe model design and deployment, content assurance, and governance and policy.
UK AI Safety Institute (UK AISI) Chief Technology Officer Jade Leung discussed the organization’s use of AI safety testing methods including automated benchmarks, human uplift trials, and expert red teaming in five key domains of concern: chemical/biological misuse, cyber misuse, autonomous systems, safeguards, and societal impacts. She presented UK AISI’s open-source INSPECT testing platform and joint testing efforts between the UK, Singapore, and the US. Leung also shared UK AISI’s international collaboration efforts, including engaging a range of countries in the AI safety summits, commissioning the International Scientific Report on the Safety of Advanced AI, and work to secure corporate Frontier AI Safety Commitments.
Tsinghua Institute for AI International Governance (I-AIIG) dean XUE Lan (薛澜) shared seven key ideas and views of China’s network of AI safety research institutions. He advocated for advancing AI safety and development simultaneously under the UN's Common Agenda and Global Digital Compact. He emphasized fairness through globally interoperable AI safety testing systems and auditable technologies. Xue called for international cooperation on data security and privacy protection, as well as enhanced coordination to prevent AI misuse in activities like misinformation. He stressed the need for increased international investment in AI safety R&D to prevent risk of AI going out of control, while strengthening global risk reporting and policy sharing through AI safety summits. Finally, he highlighted the importance of AI capacity building in developing countries to achieve shared security.
Tsinghua Professor and Zhipu AI Chief Scientist TANG Jie (唐杰) demonstrated various AI models developed at Zhipu AI, including ChatGLM and the agentic AutoGLM. The presentation discussed several key safety concerns, including jailbreak attacks and creating the SafetyBench dataset in order to conduct RLHF training and ensure model safety. He also raised the emerging challenges in ensuring safety of multimodal and increasingly agentic and potentially embodied systems.
Friederike Grosse-Holz from the EU AI Office’s AI Safety Unit gave an online address on the EU's approach to regulating general-purpose AI models. She explained that Chapter 5 of the EU AI Act, implemented through the forthcoming General-Purpose AI Code of Practice, establishes the key regulatory framework. The regulations require transparency from AI providers, mandating them to share specific model information with the EU AI Office and downstream providers while adhering to EU copyright laws. She noted that for models identified as posing systemic risks, providers must conduct thorough risk assessments and implement appropriate mitigation measures.
Panel on international AI safety cooperation
Concordia AI CEO Brian Tse (谢旻希) moderated this panel, which explored emerging developments in AI safety and opportunities for collective action in 2025. Tang presented a framework of AI development layers from language understanding, advanced reasoning, and tool use to self-learning, emphasizing the need for systemic risk assessments as systems advance toward AGI. Xue advocated for scenario planning for different AI risks, drawing upon lessons from the crisis management field, and more frequent updates to scientific assessments than the IPCC's five-year cycle. On AI safety testing, Leung highlighted that there is not yet detailed modeling for AI risks and more scientific efforts are needed to ensure replicability, necessitating UK AISI to develop many methods and tools from scratch. Lam stressed the importance of building trustworthy digital-physical interfaces, especially for safety-critical industries that can threaten human life if they malfunction. The panel also discussed the emerging challenge of monitoring and preventing various forms of deception when AI systems interact with human users. Looking ahead to 2025, panelists proposed ideas such as an ITER (global fusion project)-like global AI safety scientific project (Xue), systemic testing of AI safety against risk thresholds and red lines (Lam and Leung), and a global definition of AI safety (Tang).
Panel on AI safety testing science
This panel’s guests were Director of the National University of Singapore AI Institute Professor Mohan Kankanhalli, University of Illinois at Urbana-Champaign Professor LI Bo, and Tsinghua University Professor HUANG Minlie (黄民烈), with Singapore Infocomm Media Development Authority Director for Data-Driven Tech Wan Sie LEE moderating.
The panel examined approaches to testing methodologies and evaluation frameworks in AI safety. Kankanhalli noted that AI safety science remains in an early, empirical stage, suggesting that the field should draw inspiration from computer security's adversarial frameworks and control systems' mathematical modeling of boundary conditions. Li emphasized incorporating symbolic rules and principles for safety guarantees, addressing the challenges of bug fixes and long-tail risks in AI systems. Huang raised the promise of erasing harmful knowledge through machine unlearning as a countermeasure for jailbreak attacks, an approach also endorsed by Kankanhalli. The panel provided recommendations for international cooperation: standards for safety and security in certain domains (Li), open-source attack simulation projects (Huang), and exploring the risks of agentic systems operating in the physical world (Kankanhalli).
Panel on AI safety cooperation between regulators and industry
The participants in this panel were EU General-Purpose AI Code of Practice Vice-Chair Nitarshan Rajkumar, Resaro AI Managing Partner and CEO April Chin, and BCG X Principal Engineer SEA Robin Weston, with AI Verify Executive Director Shameek Kundu moderating.
The final panel explored the critical intersection between foundation model safety and downstream commercial applications. Rajkumar noted that while businesses focus on operational risks, governments must address broader national security concerns. He also compared foundation models to nuclear power plants upstream and AI applications to power outlets downstream to draw attention to the differing safety requirements needed at different levels – safety measures at the upstream level might be more significant and require government supervision. Chin described Resaro AI’s work in testing AI systems in high risk applications such as healthcare and education to bridge the gap between academic benchmarks and more use-specific benchmarks. She noted that the number of stakeholders is a challenge, where in one instance, over 700 test cases were required before deploying a chatbot. Weston advocated for a “continuous delivery” approach to AI deployment, arguing that gradual updates improve understanding of system behavior, help identify sources of problems, and helps to account for the fundamentally unpredictable behavior of software in the real world.