Signal Strength Estimation in Logistic Regression Using Data Splitting
Abstract
Logistic regression is widely used in applications; however, when the dimension scales with the sample size, theory reveals that the asymptotic behavior of common M-estimators depends on bias and variance scaling constants, which are functions of the signal strength. To leverage the theory to design statistical methodologies, it is essential to obtain accurate estimates of the signal strength. In this work, we utilize a data-splitting strategy to efficiently estimate the signal strength. To alleviate issues caused by separable data, we analyze the exact asymptotics of an M-estimator with a data-driven, non-decomposable regularizer that adapts to the true covariance structure. We justify the validity of our method through both theoretical analysis and numerical experiments.