ICML Analysing feature learning of gradient descent using periodic functions

Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Analysing feature learning of gradient descent using periodic functions

Jaehui Hwang · Taeyoung Kim · Hongseok Yang

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract: We present the analysis of feature learning in neural networks when target functions are defined by periodic functions applied to one-dimensional projections of the input. Previously, Damian et al (2022) considered a similar question for target functions of the form

f^{*} (x) = p^{*} (⟨ u_{1}, x ⟩, \dots, ⟨ u_{r}, x ⟩)

$f^*(x) = p^*(\langle u_1,x\rangle,\ldots,\langle u_r,x\rangle)$ for some vectors

u_{1}, \dots, u_{r} \in R^{d}

$u_1,\ldots,u_r \in \mathbb{R}^d$ and polynomial

p^{*}

$p^*$ , and proved that feature learning occurs during the training of a shallow neural network, even when the first-layer weights of the network are updated only once during training. Here feature learning refers to a subset of the first-layer weights

w_{1}, \dots, w_{m} \in R^{d}

$w_1,\ldots,w_m \in \mathbb{R}^d$ of the trained network being in the same directions as

{u_{1}, \dots, u_{r}}

$\{u_1,\ldots,u_r\}$ . We show that for periodic target functions, the same single gradient-based update of the first-layer weights induces feature learning of a shallow neural network, despite the additional challenge that feature learning for periodic functions now involves both directions and magnitudes of

{u_{1}, \dots, u_{r}}

$\{u_1,\ldots,u_r\}$ : a useful feature of, say,

f^{*} (x) = \sin (⟨ u, x ⟩)

$f^*(x) = \sin(\langle u,x\rangle)$ is a vector

w \in R^{d}

$w \in \mathbb{R}^d$ such that

∠ (w, u) \approx 0

$\angle(w, u) \approx 0$ and

‖ w ‖ \approx ‖ u ‖

$\|w\| \approx \|u\|$ . Our theoretical result shows that the sample complexity for learning a periodic target function Experimental results further support our theoretical finding, and illustrate the benefits of feature learning for a broader class of periodic target functions.

Chat is not available.

Poster in Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Analysing feature learning of gradient descent using periodic functions

Jaehui Hwang · Taeyoung Kim · Hongseok Yang

Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning