We would like to thank our reviewers for the quality of their engagements and subsequent comments. We will work towards incorporating their invaluable feedback to a finalized version. Aside from clarifying content at places, these efforts include expanding the experiment section as much as possible by including a feature importance heatmap and considering the case of datasets from different domains and different feature types and interactions (non-independence, non-uniformity of costs, etc).$ We now address some direct questions from the reviewers. (R1) $\ell$ are indeed implicitly implied binary by the constraints. We keep them continuous in our practical formulation so that the MILP solver does not branch on these during the branch-and-bound search (faster solving). (R1) Treatment of extracted features is implicitly addressed in 188-198. We will be more explicit there. (R2) Conditions i) and iii) are indeed not equivalent. i) states that IF l_1 = 1 then l_2 = 0 but does not rule out all l_i being zero. Condition iii) requires that some l_i be non-zero. We will clarify.