Political consultants and the campaigns they work for are poised to spend millions in the coming cycle on data mining. For some, that’s going to be a great decision. For others – unless they take the time to understand what data mining is and what it isn’t – they just may wake up with a bad case of buyer’s remorse.
Data mining has its roots in the pragmatic-oriented fields of business and computer science. Its main contribution has been the capability of using ever-expanding computing power to process large volumes of data and find important patterns and anomalies. Financial institutions have been able to improve their ability to detect fraud in credit card transactions using these techniques.
The idea is that with enough computing power, data, and variables thrown into a regression, eventually something “interesting” will emerge. This serendipitous approach is why many statisticians dismissed data mining as a forecasting tool early on, warning like those mutual fund notices that “past results are not necessarily an indication of future performance.”
The problem, of course, is that political strategists are far less interested in a history lesson than they are in a crystal ball that will predict the future. As much as we may wish for data mining to be that crystal ball, that’s simply not what this tool is all about.
Statisticians are concerned with making inferences (generalizations) that can be used for making predictions. Professor David J. Hand, in his 2008 article for the International Journal of Forecasting, writes, “Forecasting is fundamentally an inferential problem. That is, it is not simply a question of summarizing data, but is rather a question of generalizing from the available data to new data — and in particular to new situations […]