Skip to yearly menu bar Skip to main content


Beijing Xiaomi Mobile Software Co., Ltd.

Expo Demonstration

Xiaomi GUI Agent: Cross-Device Intelligent Task Execution via Natural Language

Mengchu Zhang ⋅ Pei Fu

GRAND BALLROOM FOYER
[ ]
Mon 6 Jul 12:30 p.m. KST — 2:30 p.m. KST

Abstract:

We demonstrate Xiaomi GUI Agent, a cross-device intelligent assistant system that enables users to control smartphones through natural language commands issued on a PC. A user types a natural language instruction (e.g., "Order a coffee from the Luckin app and send me the receipt") in a PC messaging app (e.g., Feishu/Lark). The instruction is relayed to the smartphone GUI Agent, which visually perceives the screen, reasons about the task, and autonomously operates the phone — tapping buttons, typing text, navigating menus — to complete the task. The result (e.g., a screenshot or text confirmation) is sent back to the user's PC chat. This demonstration showcases the practical deployment of vision-language models for real-world GUI automation, highlighting the agent's ability to handle multi-step, cross-app tasks on real smartphones with real applications.

Live content is unavailable. Log in and register to view live content