Random Forest Density Estimation

Hongwei Wen · Hanyuan Hang

Hall E #1309

Keywords: [ MISC: Unsupervised and Semi-supervised Learning ] [ T: Learning Theory ]

Abstract: We propose a density estimation algorithm called \textit{random forest density estimation} (\textit{RFDE}) based on random trees where the split of cell is along the midpoint of the randomly chosen dimension. By combining the efficient random tree density estimation (RTDE) and the ensemble procedure, RFDE can alleviate the problems of boundary discontinuity suffered by partition-based density estimations. From the theoretical perspective, we first prove the fast convergence rates of RFDE if the density function lies in the H\"{o}lder space $C^{0,\alpha}$. Moreover, if the target function resides in the subspace $C^{1,\alpha}$, which contains smoother density functions, we for the first time manage to explain the benefits of the ensemble learning in density estimation. To be specific, we show that the upper bound of the ensemble estimator RFDE turns out to be strictly smaller than the lower bound of its base estimator RTDE in terms of convergence rates. In the experiments, we verify the theoretical results and show the promising performance of RFDE on both synthetic and real world datasets. Moreover, we evaluate our RFDE through the problem of anomaly detection as a possible application.

Chat is not available.