Optimizing Classifiers for Hypothetical Scenarios

R. Johnson, T. Raeder, and N. V. Chawla
The 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)
Publication Date: 
May, 2015

Abstract. The deployment of classification models is an integral component of
many modern data mining and machine learning applications. A typical classifi-
cation model is built with the tacit assumption that the deployment scenario by
which it is evaluated is fixed and fully characterized. Yet, in the practical deployment
of classification methods, important aspects of the application environment,
such as the misclassification costs, may be uncertain during model building.
Moreover, a single classification model may be applied in several different
deployment scenarios. In this work, we propose a method to optimize a model
for uncertain deployment scenarios. We begin by deriving a relationship between
two evaluation measures, H measure and cost curves, that may be used to address
uncertainty in classifier performance. We show that when uncertainty in classi-
fier performance is modeled as a probabilistic belief that is a function of this
underlying relationship, a natural definition of risk emerges for both classifiers
and instances. We then leverage this notion of risk to develop a boosting-based
algorithm—which we call RiskBoost—that directly mitigates classifier risk, and
we demonstrate that it outperforms AdaBoost on a diverse selection of datasets.