Species distribution modeling and prediction: A class imbalance problem
Predicting the distributions of species is central to a variety of applications in ecology and conservation biology. With increasing interest in using electronic occurrence records, many modeling techniques have been developed to utilize this data and compute the potential distribution of species as a proxy for actual observations. As the actual observations are typically overwhelmed by non-occurrences, we approach the modeling of species' distributions with a focus on the problem of class imbalance. Our analysis includes the evaluation of several machine learning methods that have been shown to address the problems of class imbalance, but which have rarely or never been applied to the domain of species distribution modeling. Evaluation of these methods includes the use of the area under the precision-recall curve (AUPR), which can supplement other metrics to provide a more informative assessment of model utility under conditions of class imbalance. Our analysis concludes that emphasizing techniques that specifically address the problem of class imbalance can provide AUROC and AUPR results competitive with traditional species distribution models.