Expo Demonstration
Xiaomi GUI Agent: Cross-Device Intelligent Task Execution via Natural Language
Mengchu Zhang ⋅ Pei Fu
GRAND BALLROOM FOYER
We demonstrate Xiaomi GUI Agent, a cross-device intelligent assistant system that enables users to control smartphones through natural language commands issued on a PC. A user types a natural language instruction (e.g., "Order a coffee from the Luckin app and send me the receipt") in a PC messaging app (e.g., Feishu/Lark). The instruction is relayed to the smartphone GUI Agent, which visually perceives the screen, reasons about the task, and autonomously operates the phone — tapping buttons, typing text, navigating menus — to complete the task. The result (e.g., a screenshot or text confirmation) is sent back to the user's PC chat. This demonstration showcases the practical deployment of vision-language models for real-world GUI automation, highlighting the agent's ability to handle multi-step, cross-app tasks on real smartphones with real applications.
Live content is unavailable. Log in and register to view live content