High performance Insights from GPU version CatBoost
Abstract
In this paper we discus GPU implementation of open-sourced gradient boosting library CatBoost. This implementations shows the state-of-the-art performance among openly-available libraries and we want to share design insights and used algorithms.
References
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning. ACM, 161–168.
Bojan Cestnik et al. 1990. Estimating probabilities: a crucial task in machine learning.. In ECAI, Vol. 90. 147–149.
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794.
Michael Garland Duane Merrill NVIDIA Corporation. 2016. Single- pass Parallel Prefix Scan with Decoupled Look-back. Technical Report. NVIDIA.
Jerome H Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367–378.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3146–3154. http://papers.nips.cc/paper/ 6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree. pdf
Daniele Micci-Barreca. 2001. A preprocessing scheme for high- cardinality categorical attributes in classification and prediction prob- lems. ACM SIGKDD Explorations Newsletter 3, 1 (2001), 27–32.
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 6638–6648. http://papers.nips.cc/paper/7898-catboost-unbiased-boosting-with-categorical-features.pdf 826
Byron P Roe, Hai-Jun Yang, Ji Zhu, Yong Liu, Ion Stancu, and Gordon 827 McGregor. 2005. Boosted decision trees as an alternative to artifi- 828 cial neural networks for particle identification. Nuclear Instruments
and Methods in Physics Research Section A: Accelerators, Spectrometers,
Detectors and Associated Equipment 543, 2 (2005), 577–584. 830
Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 831 2010. Adapting boosting for information retrieval measures. Informa- 832 tion Retrieval 13, 3 (2010), 254–270.
Yanru Zhang and Ali Haghani. 2015. A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies 58 (2015), 308–324.
This work is licensed under a Creative Commons Attribution 4.0 International License.