Timezone: »
Optimization lies at the heart of many exciting developments in machine learning, statistics and signal processing. As models become more complex and datasets get larger, finding efficient, reliable and provable methods is one of the primary goals in these fields.
In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.
Higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. At least in theory, these methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. Researchers have even developed stochastic versions of higher-order methods, that feature speed and scalability by incorporating curvature information in an economical and judicious manner. However, often higher-order methods are “undervalued.”
This workshop will attempt to shed light on this statement. Topics of interest include, but are not limited to, second-order methods, adaptive gradient descent methods, regularization techniques, as well as techniques based on higher-order derivatives.
Fri 8:00 a.m. - 8:15 a.m.
|
Introductory notes
(
Talk
)
Introductory notes for the workshop |
🔗 |
Fri 8:15 a.m. - 9:00 a.m.
|
Talk by Peter Richtarik - Fast linear convergence of randomized BFGS
(
Talk
)
Since the late 1950's when quasi-Newton methods first appeared, they have become one of the most widely used and efficient algorithmic paradigms for unconstrained optimization. Despite their immense practical success, there is little theory that shows why these methods are so efficient. We provide a semi-local rate of convergence for the randomized BFGS method which can be significantly better than that of gradient descent, finally giving theoretical evidence supporting the superior empirical performance of the method. |
Peter Richtarik 🔗 |
Fri 9:00 a.m. - 9:10 a.m.
|
Q&A with Peter Richtarik
(
Discussion
)
|
Peter Richtarik 🔗 |
Fri 9:10 a.m. - 9:55 a.m.
|
Talk by Francis Bach - Second Order Strikes Back - Globally convergent Newton methods for ill-conditioned generalized self-concordant Losses
(
Talk
)
SlidesLive Video » We will study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nyström projection techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(n\sqrt{n}), a memory complexity of order O(n) and no dependence on the condition number, generalizing the results known for least-squares regression (Joint work with Ulysse Marteau-Ferey and Alessandro Rudi, https://arxiv.org/abs/1907.01771). |
Francis Bach 🔗 |
Fri 9:55 a.m. - 10:05 a.m.
|
Q&A with Francis Bach
(
Discussion
)
|
Francis Bach 🔗 |
Fri 10:05 a.m. - 10:30 a.m.
|
Break until 10:30am (PDT)
|
🔗 |
Fri 10:30 a.m. - 10:40 a.m.
|
Spotlight talk 1 - A Second-Order Optimization Algorithm for Solving Problems Involving Group Sparse Regularization
(
Talk
)
We consider the problem of minimizing the sum of a convex function and a sparsity-inducing group regularizer. Problems involving such regularizers arise in modern machine learning applications often for the purpose of obtaining models that are easier to interpret and that have higher predictive accuracy. We present a new second-order algorithm for solving such problems that uses subspace acceleration, domain decomposition, and support identification. Our analysis shows that our algorithm has strong complexity properties, and preliminary numerical results for binary classification based on regularized logistic regression show that our approach is efficient and robust. |
Daniel Robinson 🔗 |
Fri 10:40 a.m. - 10:50 a.m.
|
Spotlight talk 2 - Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian
(
Talk
)
Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring are a key component of the success of deep neural networks. However, in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired types of solutions which SGD on `standard' loss functions will not find. In this paper, we present a different approach. Rather than following the a locally greedy gradient, we instead follow the eigenvectors of the Hessian. We call these 'ridges'. By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Riding, offers a promising direction. |
Jack Parker-Holder 🔗 |
Fri 10:50 a.m. - 11:00 a.m.
|
Spotlight talk 3 - PyHessian: Neural Networks Through the Lens of the Hessian
(
Talk
)
SlidesLive Video » We present PyHessian, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PyHessianenables fast computation of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and supports distributed-memory execution on cloud/supercomputer systems and available as open source. We show that this framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. In particular, we analyze the effect of Batch Normalization layers on the trainability of NNs. We find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallow networks, as opposed to common belief. |
Amir Gholaminejad 🔗 |
Fri 11:00 a.m. - 11:45 a.m.
|
Talk by Coralia Cartis - Dimensionality reduction techniques for large-scale optimization problems
(
Talk
)
Known by many names, sketching techniques allow random projections of data from high to low dimensions while preserving pairwise distances. This talk explores ways to use sketching so as to improve the scalability of algorithms for diverse classes of optimization problems and applications, from linear to nonlinear, local to global, derivative-based to derivative-free. Regression problems and Gauss-Newton techniques will receive particular attention. Numerical illustrations on standard optimization test problems as well as on some machine learning set-ups will be presented. This work is joint with Jan Fiala (NAG Ltd), Jaroslav Fowkes (Oxford), Estelle Massart (Oxford and NPL), Adilet Otemissov (Oxford and Turing), Alex Puiu (Oxford), Lindon Roberts (Australian National University, Canberra), Zhen Shao (Oxford). |
Coralia Cartis 🔗 |
Fri 11:45 a.m. - 11:55 a.m.
|
Q&A with Coralia Cartis
(
Discussion
)
|
Coralia Cartis 🔗 |
Fri 11:55 a.m. - 1:30 p.m.
|
Break until 13:30pm (PDT)
|
🔗 |
Fri 1:30 p.m. - 1:40 p.m.
|
Spotlight talk 4 - MomentumRNN: Integrating Momentum into Recurrent Neural Networks
(
Talk
)
Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks, with little compromise in computational or memory efficiency. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. |
Minh Nguyen 🔗 |
Fri 1:40 p.m. - 1:50 p.m.
|
Spotlight talk 5 - Step-size Adaptation Using Exponentiated Gradient Updates
(
Talk
)
SlidesLive Video » Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global step-size scale for the update as well as a gain factor for each coordinate. We adjust the global scale based on the alignment of the average gradient and the current gradient vectors. A similar approach is used for updating the local gain factors. This type of step-size scale tuning has been done before with gradient descent updates. In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead. Experimentally, we show that our approach can achieve compelling accuracy on standard models without using any specially tuned learning rate schedule. We also show the effectiveness of our approach for quickly adapting to distribution shifts in the data during training. |
Ehsan Amid 🔗 |
Fri 1:50 p.m. - 2:00 p.m.
|
Spotlight talk 6 - Competitive Mirror Descent
(
Talk
)
SlidesLive Video » Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints. This is a highly expressive modeling language that subsumes most of modern machine learning.In this work we propose competitive mirror descent (CMD): a general method for solving such problems based on first order information that can be obtained by automatic differentiation.First, by adding Lagrange multipliers, we obtain a simplified constraint set with an associated Bregman potential.At each iteration, we then solve for the Nash equilibrium of a regularized bilinear approximation of the full problem to obtain a direction of movement of the agents.Finally, we obtain the next iterate by following this direction according to the dual geometry induced by the Bregman potential.By using the dual geometry we obtain feasible iterates despite only solving a linear system at each iteration, eliminating the need for projection steps while still accounting for the global nonlinear structure of the constraint set.As a special case we obtain a novel competitive multiplicative weights algorithm for problems on the positive cone. |
Florian Schäfer 🔗 |
Fri 2:00 p.m. - 2:15 p.m.
|
Industry Panel - Talk by Boris Ginsburg - Large scale deep learning: new trends and optimization challenges
(
Talk
)
SlidesLive Video » I will discuss two major trends in the deep learning. The first trend is an exponential growth in the size of models: from 340M (BERT-large) in 2018 to 175B (GPT3) in 2020. We need new, more memory efficient algorithms to train such huge models. The second trend is “BERT-approach”, when a model is first pre-trained in unsupervised or self-supervised manner on large unlabeled dataset, and then it is fine-tuned for another task using a. smaller labeled dataset. This trend sets new theoretical problems. Next, I will discuss a practical need in theoretical foundation for regularization methods used in the deep learning practice: data augmentation, dropout, label smoothing etc. Finally, I will describe an application-driven design of new optimization methods using NovoGrad as example. |
Boris Ginsburg 🔗 |
Fri 2:15 p.m. - 2:30 p.m.
|
Industry Panel - Talk by Jonathan Hseu - ML Models in Production
(
Talk
)
SlidesLive Video » We discuss the difficulties of training and improving ML models on large datasets in production. We also go into the process of an engineer working on ML models, and the challenges of trading off cost, model quality, and performance. Finally, we go into a wishlist of optimization improvements that could improve the workflow of engineers working on these models. |
Jonathan Hseu 🔗 |
Fri 2:30 p.m. - 2:45 p.m.
|
Industry Panel - Talk by Andres Rodriguez - Shifting the DL industry to 2nd order methods
(
Talk
)
SlidesLive Video » In this talk, we review the topology design process used by data scientists and explain why 1st order methods are computational expensive in the design process. We explore the benefits of 2nd order methods to reduce the topology design cost and highlight recent work that approximates the inverse Hessian. We conclude with recommendations to accelerate the adoption of these methods in the DL ecosystem. |
Andres Rodriguez 🔗 |
Fri 2:45 p.m. - 3:00 p.m.
|
Industry Panel - Talk by Lin Xiao - Statistical Adaptive Stochastic Gradient Methods
(
Talk
)
SlidesLive Video » Stochastic gradient descent (SGD) and its many variants serve as the workhorses of deep learning. One of the foremost pain points in using these methods in practice is hyperparameter tuning, especially the learning rate (step size). We propose a statistical adaptive procedure called SALSA to automatically schedule the learning rate for a broad family of stochastic gradient methods. SALSA first uses a smoothed line-search procedure to find a good initial learning rate, then automatically switches to a statistical method, which detects stationarity of the learning process under a fixed learning rate, and drops the learning rate by a constant factor whenever stationarity is detected. The combined procedure is highly robust and autonomous, and it matches the performance of the best hand-tuned methods in several popular deep learning tasks. |
Lin Xiao 🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
Industry panel Q&A
(
Discussion
)
|
🔗 |
Fri 3:30 p.m. - 4:15 p.m.
|
Talk by Rachel Ward - Weighted Optimization: better generalization by smoother interpolation
(
Talk
)
We provide a rigorous analysis of how implicit bias towards smooth interpolations leads to low generalization error in the overparameterized setting. We provide the first case study of this connection through a random Fourier series model and weighted least squares. We then argue through this model and numerical experiments that normalization methods in deep learning such as weight normalization improve generalization in overparameterized neural networks by implicitly encouraging smooth interpolants. This is work with Yuege (Gail) Xie, Holger Rauhut, and Hung-Hsu Chou. |
Rachel Ward 🔗 |
Fri 4:15 p.m. - 4:25 p.m.
|
Q&A with Rachel Ward
(
Discussion
)
|
Rachel Ward 🔗 |
Fri 4:25 p.m. - 5:00 p.m.
|
Break until 17:00pm (PDT)
|
🔗 |
Fri 5:00 p.m. - 5:45 p.m.
|
Talk by Rio Yokota - Degree of Approximation and Overhead of Computing Curvature, Information, and Noise Matrices
(
Talk
)
SlidesLive Video » Hessian, Fisher, and Covariance matrices are not only used for preconditioning optimizers, but also in generalization metrics, predicting hyperparameters, and Bayesian inference. These matrices contain valuable information that can advance theory in statistical learning, but they are very expensive to compute exactly for modern deep neural networks with billions of parameters. We make use of a highly optimized implementation for computing these matrices with various degrees of approximation to close the gap between theory and practice. We are able to significantly reduce the overhead of computing these matrices through a hybrid data-parallel + model-parallel approach. |
Rio Yokota 🔗 |
Fri 5:45 p.m. - 5:55 p.m.
|
Q&A with Rio Yokota
(
Discussion
)
|
Rio Yokota 🔗 |
Fri 5:55 p.m. - 6:00 p.m.
|
Closing remarks
(
Discussion
)
|
🔗 |
Author Information
Albert S Berahas (Lehigh University)
Amir Gholaminejad (University of California, Berkeley)
Anastasios Kyrillidis (Rice University)
Michael Mahoney (UC Berkeley)
Fred Roosta (University of Queensland)
More from the Same Authors
-
2021 : Mitigating deep double descent by concatenating inputs »
John Chen · Qihan Wang · Anastasios Kyrillidis -
2023 : Adaptive Federated Learning with Auto-Tuned Clients »
J. Lyle Kim · Mohammad Taha Toghani · Cesar Uribe · Anastasios Kyrillidis -
2023 : Fast Feature Selection with Fairness Constraints »
Francesco Quinzan · Rajiv Khanna · Moshik Hershcovitch · Sarel Cohen · Daniel Waddington · Tobias Friedrich · Michael Mahoney -
2023 Poster: Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes »
Liam Hodgkinson · Chris van der Heide · Fred Roosta · Michael Mahoney -
2023 Poster: Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching »
Ilgee Hong · Sen Na · Michael Mahoney · Mladen Kolar -
2023 Poster: Learning Physical Models that Can Respect Conservation Laws »
Derek Hansen · Danielle Robinson · Shima Alizadeh · Gaurav Gupta · Michael Mahoney -
2023 Poster: A Three-regime Model of Network Pruning »
Yefan Zhou · Yaoqing Yang · Arin Chang · Michael Mahoney -
2022 Poster: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Poster: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Spotlight: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Spotlight: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Poster: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Spotlight: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2021 Workshop: Beyond first-order methods in machine learning systems »
Albert S Berahas · Anastasios Kyrillidis · Fred Roosta · Amir Gholaminejad · Michael Mahoney · Rachael Tappenden · Raghu Bollapragada · Rixon Crane · J. Lyle Kim -
2021 Poster: I-BERT: Integer-only BERT Quantization »
Sehoon Kim · Amir Gholaminejad · Zhewei Yao · Michael Mahoney · EECS Kurt Keutzer -
2021 Poster: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Spotlight: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Oral: I-BERT: Integer-only BERT Quantization »
Sehoon Kim · Amir Gholaminejad · Zhewei Yao · Michael Mahoney · EECS Kurt Keutzer -
2021 Poster: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Oral: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Poster: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2021 Spotlight: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2020 : Determinantal Point Processes in Randomized Numerical Linear Algebra »
Michael Mahoney -
2020 : Spotlight talk 3 - PyHessian: Neural Networks Through the Lens of the Hessian »
Amir Gholaminejad -
2020 Poster: Forecasting Sequential Data Using Consistent Koopman Autoencoders »
Omri Azencot · N. Benjamin Erichson · Vanessa Lin · Michael Mahoney -
2020 Poster: PowerNorm: Rethinking Batch Normalization in Transformers »
Sheng Shen · Zhewei Yao · Amir Gholaminejad · Michael Mahoney · Kurt Keutzer -
2020 Poster: DINO: Distributed Newton-Type Optimization Method »
Rixon Crane · Fred Roosta -
2020 Poster: Negative Sampling in Semi-Supervised learning »
John Chen · Vatsal Shah · Anastasios Kyrillidis -
2020 Poster: Error Estimation for Sketched SVD via the Bootstrap »
Miles Lopes · N. Benjamin Erichson · Michael Mahoney -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks »
Michael Mahoney -
2019 Poster: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2019 Oral: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2019 Poster: Compressing Gradient Optimizers via Count-Sketches »
Ryan Spring · Anastasios Kyrillidis · Vijai Mohan · Anshumali Shrivastava -
2019 Oral: Compressing Gradient Optimizers via Count-Sketches »
Ryan Spring · Anastasios Kyrillidis · Vijai Mohan · Anshumali Shrivastava -
2018 Poster: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Oral: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Poster: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2018 Oral: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2017 Poster: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney -
2017 Poster: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney