Political consultants and the campaigns they work for are poised to spend millions in the coming cycle on data mining. For some, that’s going to be a great decision. For others – unless they take the time to understand what data mining is and what it isn’t – they just may wake up with a bad case of buyer’s remorse.
Data mining has its roots in the pragmatic-oriented fields of business and computer science. Its main contribution has been the capability of using ever-expanding computing power to process large volumes of data and find important patterns and anomalies. Financial institutions have been able to improve their ability to detect fraud in credit card transactions using these techniques.
The idea is that with enough computing power, data, and variables thrown into a regression, eventually something “interesting” will emerge. This serendipitous approach is why many statisticians dismissed data mining as a forecasting tool early on, warning like those mutual fund notices that “past results are not necessarily an indication of future performance.”
The problem, of course, is that political strategists are far less interested in a history lesson than they are in a crystal ball that will predict the future. As much as we may wish for data mining to be that crystal ball, that’s simply not what this tool is all about.
Statisticians are concerned with making inferences (generalizations) that can be used for making predictions. Professor David J. Hand, in his 2008 article for the International Journal of Forecasting, writes, “Forecasting is fundamentally an inferential problem. That is, it is not simply a question of summarizing data, but is rather a question of generalizing from the available data to new data — and in particular to new situations which are likely to arise in the future.” You can find interesting information in an almanac (data mining) but you need to understand meteorology (statistical inference) if you want to know what the weather will be next weekend.
Making predictions should be an “experimental” process, one guided by theory and methodological rules, and designed to help us make decisions about how best to use our resources. It’s at that point that we may be asking too much of data mining. Because data miners do not usually generate their own data, but work from compiled sources, they inherit whatever methodological flaws and biases are present in the compilations. The mining process itself is only as good as the software design. While those designs are getting better, process is no substitute for strategy, any more than message delivery vehicles are a substitute for message.
Ironically, as the analytic tools proliferate, strategic analysis appears to be on life support, more afterthought than guiding force. Some of the state-of-the-art miners like i360 have created great ancillary tools to help turn a descriptive discipline into a predictive one, but a lot of consultants don’t want tools, they want magic. And as Miracle Max says in The Princess Bride, “sometimes the magic works, and sometimes it doesn’t.”