Poster Thu, Jul 9, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

The benefits of full data shuffle, now with optimal I/O cost: $k$-wise independence and matrix transposition to the rescue

Peyman Afshani ⋅ Rezaul Chowdhury ⋅ Mayank Goswami ⋅ Jens Kristian R Schou ⋅ Francesco Silvestri ⋅ Mariafiore Tognon

Abstract

It is known that RandomShuffle, the without replacement version of Stochastic Gradient Descend (SGD), converges faster than with-replacement SGD. However, RandomShuffle requires to uniformly perform a random permutation of the input sequence, which is known to have an high I/O complexity due to data movements over the memory hierarchy. In this paper, we propose a shuffling algorithm with a linear I/O complexity that generates almost-uniformly random permutations with rigorous mathematical guarantees. Specifically, we show that the shuffling algorithm can generate $2$-wise independent permutations. Furthermore, we can extend to $k$-wise independency with a small error in the probability distribution, if the fast memory has at least $k$ memory blocks. These results allow us to reach the same expected theoretical convergence as RandomShuffle while achieving optimal linear I/O cost.