The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.
Eduardo Laber (PUC-RIO)
Marco Molinaro (PUC-RIO)
Felipe de A. Mello Pereira (PUC-Rio)
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: Binary Partitions with Approximate Minimum Impurity »
Wed Jul 11th 03:30 -- 03:40 PM Room K11