Workshop
Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
Zhenfei (Jeremy) Yin · Yawen Duan · Lijun Li · Jianfeng Chi · Yichi Zhang · Pavel Izmailov · Bo Li · Andy Zou · Yaodong Yang · Hang Su · Jing Shao · Yu Qiao · Jun Zhu · Xuanjing Huang · Wanli Ouyang · Dacheng Tao · Phil Torr
Straus 1
Sat 27 Jul, 12:30 a.m. PDT
Advanced Multi-modal Foundation Models (MFMs) and AI Agents, equipped with diverse modalities [1, 2, 3, 4, 15] and an increasing number of available affordances [5, 6] (e.g., tool use, code interpreter, API access, etc.), have the potential to accelerate and amplify their predecessors’ impact on society [7].
MFM includes multi-modal large language models (MLLMs) and multi-modal generative models (MMGMs). MLLMs refer to LLM-based models with the ability to receive, reason, and output with information of multiple modalities, including but not limited to text, images, audio, and video. Examples include Llava [1], Reka [8], QwenVL [9], LAMM [36],and so on. MMGMs refer to a class of MFM models that can generate new content across multiple modalities, such as generating images from text descriptions or creating videos from audio and text inputs. Examples include Stable Diffusion [2], Sora [10], and Latte [11]. AI agents, or systems with higher degree of agenticness, refer to systems that could achieve complex goals in complex environments with limited direct supervision [12]. Understanding and preempting the vulnerabilities of these systems [13, 35] and their induced harms [14] becomes unprecedentedly crucial.
Building trustworthy MFMs and AI Agents transcends adversarial robustness of such models, but also emphasizes the importance of proactive risk assessment, mitigation, safeguards, and the establishment of comprehensive safety mechanisms throughout the lifecycle of the systems’ development and deployment [16, 17]. This approach demands a blend of technical and socio-technical strategies, incorporating AI governance and regulatory insights to build trustworthy MFMs and AI Agents.
Topics include but are not limited to: - Adversarial attack and defense, poisoning, hijacking and security [18, 13, 19, 20, 21] - Robustness to spurious correlations and uncertainty estimation - Technical approaches to privacy, fairness, accountability and regulation [12, 22, 28] - Truthfulness, factuality, honesty and sycophancy [23, 24] - Transparency, interpretability and monitoring [25, 26] - Identifiers of AI-generated material, such as watermarking [27] - Technical alignment / control , such as scalable overslight [29], representation control [26] and machine unlearning [30] - Model auditing, red-teaming and safety evaluation benchmarks [31, 32, 33, 16] - Measures against malicious model fine-tuning [34] - Novel safety challenges with the introduction of new modalities