Quantifying Decision Making for Data Science: From Data Acquisition to Modeling

Authors: 
Saurabh Nagrecha and Nitesh V Chawla
Citation: 
EPJ Data Science
Publication Date: 
August, 2016

Organizations, irrespective of their size and type, are increasingly becoming data-driven or aspire to become data-driven. There is a rush to quantify value of their own internal data or the value of integrating their internal data with external data, and performing modeling on such data. A question that analytics teams often grapple with is whether to acquire more data or expend additional effort on more complex modeling, or both. If these decisions can be quantified a priori, it can be used to guide budget and investment decisions. To that end, we quantify the Net Present Value (NPV) of the tasks of additional data acquisition or more complex modeling, which are critical to the data science process. We develop a framework, NPVModel, for a comparative analysis of various external data acquisition and in-house model development scenarios using NPVs of costs and returns as a measure of feasibility. We then demonstrate the effectiveness of NPVModel in prescribing strategies for various scenarios. Our framework not only acts as a suggestion engine, but it also provides valuable insights into budgeting and roadmap planning for Big Data ventures.