This talk gives an overview of recent results in a line of theoretical work that started 3 decades ago in statistical physics. We will first discuss teacher-student setting of the generalized linear regression. We illustrate the presence of the interpolation peak for classification with ridge loss and its vanishing with regularization. We show that, in the spherical perceptron, the optimally regularized logistic regression approaches very closely the Bayes optimal accuracy. We contrast this with the non-convex case of phase retrieval where the canonical empirical risk minimization performs poorly compared to the Bayes-optimal error. We then move towards learning with hidden units and analyze double descent in learning with generic fixed features and any convex loss. The formulas we obtain a generic enough to describe the learning of the last layer of neural networks for realistic data and networks. Finally, for the phase retrieval, we are able to analyze gradient descent in the feature-learning regime of a two-layer neural network where we show that overparametrization allows a considerable reduction of the sample complexity. Concretely, an overparametrized neural network only needs twice the input dimension of samples, while non-overparametrized network needs constant times more, and kernel regression quadratically many samples in the input dimension.