Skip to yearly menu bar Skip to main content

Workshop: Time Series Workshop

Morning Poster Session: ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer

Eslam Mohamed Abd El Rahman


We propose ST-DETR, a Spatio-TemporalTransformer-based architecture for object detec-tion from a sequence of temporal frames. We treatthe temporal frames as sequences in both spaceand time and employ the full attention mecha-nisms to take advantage of the features correla-tions over both dimensions. This treatment en-ables us to deal with frames sequence as temporalobject features traces over every location in thespace. We explore two possible approaches; theearly spatial features aggregation over the tempo-ral dimension, and the late temporal aggregationof object query spatial features. Moreover, wepropose a novel Temporal Positional Embeddingtechnique to encode the time sequence informa-tion. To evaluate our approach, we choose theMoving Object Detection (MOD) task, since it isa perfect candidate to showcase the importance ofthe temporal dimension. Results show a signifi-cant 5% mAP improvement on the KITTI MODdataset over the 1-step spatial baseline.