Skip to yearly menu bar Skip to main content


Google

Expo Demonstration

Pushing the Frontiers of Large-Scale 3D modeling for Robotics & Beyond

Krzysztof Choromanski

GRAND BALLROOM FOYER
[ ]
Sun 5 Jul 8:30 p.m. PDT — 10:30 p.m. PDT

Abstract:

3D understanding of the surrounding world is one of the key prerequisites for building truly foundational robotics models. In this demonstration, we will present to the audience some of the latest technologies that address this challenge. We will discuss a wide spectrum of methods ranging from the 3D-aware positional encoding mechanisms in Transformers to Gromov-Wasserstein techniques with geodesic distances. We will provide several robotics applications. The presentation will be complemented with the live demo involving the audience, showing some of the discussed techniques in action.

1. Learning the RoPEs: Better 2D & 3D Positional Encodings with STRING
https://arxiv.org/abs/2502.02562
https://sites.google.com/view/string-robotics


In this part of the workshop, we will present a new class of methods, called STRING, extending popular RoPE mechanisms, for the 3D-aware positional encodings in Transformers and exactly translation-invariant. We will show their applications ranging from 3D scene understanding to designing robotic policies operating on data with depth information. Presented concepts will be accompanied with the live demo, showing them in action.

2. RelFlexformer: Efficient Attention 3D Transformers for Integrable Relative Positional Encodings
https://arxiv.org/pdf/2605.10706
https://relflexformer.github.io/

RelFlexformers is a recently introduced powerful class of 3D Transformers, equipped with general additive relative positional encoding (RPE) techniques. Those techniques significantly improve downstream performance on tasks ranging from 3D classification to 3D segmentation, for both point cloud and depth images modality (often providing gains, as compared to regular Transformer models). RelFlexformers are also fully compatible with low-rank linear attention methods (such as Performers) via efficient RPE calculations with the Non-Uniform Fast Fourier Transform.

3. GenusSink: unlocking Optimal Transport with Geodesic Distances for 3D Robotics
https://arxiv.org/abs/2605.09782

In this part of the workshop, we will focus on the new class of methods for efficiently solving the Wasserstein / optimal transport (OT) problem (or their regularized Sinkhorn versions) for geodesic distances with new tools from structural graph theory and computational geometry. Geodesic distances play critical role in robotics, and are used on the regular basis in particular for graph representations of manifolds. Gromov-Wasserstein distance can be used to define the similarity between complex objects and as such are relevant for pose estimation, motion tracking and template detection techniques. Presented algorithms provide ways to conduct calculations with geodesic distances for the OT setting in the near-linear time for the regular Sinkhorn-regularized Wasserstein problem and sub-cubic (quadratic or linear) for the Gromov Wasserstein lifting (involving two metric spaces).

4. Graph Random Features
https://arxiv.org/abs/2305.00156
https://arxiv.org/abs/2310.04859

Graph Random Features (GRFs) provide new continuous representations of points in graph metric spaces defined via graph kernels as well as the representations of the entire graphs. Furthermore, they come with strong theoretical guarantees via the theory of graph kernels (potentially learnable, e.g. with deep neural networks). They are also efficient to compute, providing a gateway to explicitly modeling graphs of hundreds of thousands of nodes and more. In this part of the presentation, we will provide an introduction to the theory of GRFs, show how we can further scale them up to implicitly-defined networks, as well as: discuss several applications, most notably in: particle-based dynamics models for robotics and quantum computations.

Live content is unavailable. Log in and register to view live content