ICML 2019 Expo Talk
July 12, 2020
CatBoost — the new generation of Gradient Boosting
Anna Veronika Dorogush (Yandex)
"Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others. CatBoost (http://catboost.yandex) is a popular open-source gradient boosting library with a whole set of advantages: 1. CatBoost is able to incorporate categorical features in your data (like music genre or city) with no additional preprocessing. 2. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries. 3. CatBoost predictions are 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks. 4. CatBoost has a variety of tools to analyze your model. 5. CatBoost has a list of efficient ranking modes that are actively used in production. The talk will cover a broad description of gradient boosting and its areas of usage with examples of its production usages in Yandex. We will explain differences between CatBoost and other openly available gradient boosting libraries. We'll discuss key techniques that allow CatBoost to achieve greater results in a variety of practical task. Also, we'll share lessons learned from scaling gradient boosting on big data problems we are solving in Yandex."